告别RAM爆炸！Memvid把百万文本块塞进视频，检索快过眨眼

memvid Public

Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. No database needed.

pypi.org/project/memvid/

MIT license

Open in github.dev Open in a new github.dev tab Open in codespace

Name	Name	Last commit message
Merge pull request #10 from answer-huang/update_md 5524b0a · 31 Commits
data	data	v0.1.2
docker	docker	Reworked the dockerization to all work from the base encoder. Many th…
examples	examples
memvid.egg-info	memvid.egg-info	Add Docker support, multi-LLM providers, and enhanced codec configura…
memvid	memvid	Add Docker support, multi-LLM providers, and enhanced codec configura…
tests	tests	clean up 1
.gitignore	.gitignore	Add Docker support, multi-LLM providers, and enhanced codec configura…
CLAUDE.md	CLAUDE.md	clean up 1
CONTRIBUTING.md	CONTRIBUTING.md	v0.1.2
LICENSE	LICENSE	v0.1.2

The lightweight, game-changing solution for AI memory at scale
轻量级、改变游戏规则的大规模 AI 内存解决方案

Memvid revolutionizes AI memory management by encoding text data into videos, enabling lightning-fast semantic search across millions of text chunks with sub-second retrieval times. Unlike traditional vector databases that consume massive amounts of RAM and storage, Memvid compresses your knowledge base into compact video files while maintaining instant access to any piece of information.
Memvid 通过将文本数据编码为视频，以 亚秒级的检索时间 在数百万个文本块中实现 闪电般快速的语义搜索 ，从而彻底改变了 AI 内存管理。与消耗大量 RAM 和存储空间的传统矢量数据库不同，Memvid 将您的知识库压缩为紧凑的视频文件，同时保持对任何信息的即时访问。

mem.mp4

🎥 Video-as-Database: Store millions of text chunks in a single MP4 file
🎥 视频即数据库 ：在单个 MP4 文件中存储数百万个文本块
🔍 Semantic Search: Find relevant content using natural language queries
🔍 语义搜索 ：使用自然语言查询查找相关内容
💬 Built-in Chat: Conversational interface with context-aware responses
💬 内置聊天 ：具有上下文感知响应的对话界面
📚 PDF Support: Direct import and indexing of PDF documents
📚 PDF 支持 ：直接导入和索引 PDF 文档
🚀 Fast Retrieval: Sub-second search across massive datasets
🚀 快速检索 ：在海量数据集中进行亚秒级搜索
💾 Efficient Storage: 10x compression compared to traditional databases
💾 高效存储 ：与传统数据库相比压缩 10 倍
🔌 Pluggable LLMs: Works with OpenAI, Anthropic, or local models
🔌 可插拔 LLM ：适用于 OpenAI、Anthropic 或本地模型
🌐 Offline-First: No internet required after video generation
🌐 离线优先 ：生成视频后无需互联网
🔧 Simple API: Get started with just 3 lines of code
🔧 Simple API ：只需 3 行代码即可开始使用
📖 Digital Libraries: Index thousands of books in a single video file
📖 数字图书馆 ：在单个视频文件中为数千本书编制索引
🎓 Educational Content: Create searchable video memories of course materials
🎓 教育内容 ：创建课程资料的可搜索视频记忆
📰 News Archives: Compress years of articles into manageable video databases
📰 News Archives ：将多年的文章压缩到可管理的视频数据库中
💼 Corporate Knowledge: Build company-wide searchable knowledge bases
💼 企业知识 ：构建公司范围的可搜索知识库
🔬 Research Papers: Quick semantic search across scientific literature
🔬 研究论文 ：跨科学文献的快速语义搜索
📝 Personal Notes: Transform your notes into a searchable AI assistant
📝 个人笔记 ：将您的笔记转换为可搜索的 AI 助手
Video as Database: Store millions of text chunks in a single MP4 file
视频即数据库 ：在单个 MP4 文件中存储数百万个文本块
Instant Retrieval: Sub-second semantic search across massive datasets
即时检索 ：跨海量数据集进行亚秒级语义搜索
10x Storage Efficiency: Video compression reduces memory footprint dramatically
10 倍存储效率 ：视频压缩可显著减少内存占用
Zero Infrastructure: No database servers, just files you can copy anywhere
零基础设施 ：没有数据库服务器，只有可以复制到任何地方的文件
Offline-First: Works completely offline once videos are generated
离线优先 ：生成视频后完全离线工作
Minimal Dependencies: Core functionality in ~1000 lines of Python
最小依赖 性：~1000 行 Python 中的核心功能
CPU-Friendly: Runs efficiently without GPU requirements
CPU 友好 ：无需 GPU 即可高效运行
Portable: Single video file contains your entire knowledge base
便携：单个视频文件包含您的整个知识库
Streamable: Videos can be streamed from cloud storage
可流式传输 ：可以从云存储流式传输视频

pip install memvid

pip install memvid PyPDF2

# Create a new project directory
mkdir my-memvid-project
cd my-memvid-project

# Create virtual environment
python -m venv venv

# Activate it
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

# Install memvid
pip install memvid

# For PDF support:
pip install PyPDF2

from memvid import MemvidEncoder, MemvidChat

# Create video memory from text chunks
chunks = ["Important fact 1", "Important fact 2", "Historical event details"]
encoder = MemvidEncoder()
encoder.add_chunks(chunks)
encoder.build_video("memory.mp4", "memory_index.json")

# Chat with your memory
chat = MemvidChat("memory.mp4", "memory_index.json")
chat.start_session()
response = chat.chat("What do you know about historical events?")
print(response)

from memvid import MemvidEncoder
import os

# Load documents
encoder = MemvidEncoder(chunk_size=512, overlap=50)

# Add text files
for file in os.listdir("documents"):
    with open(f"documents/{file}", "r") as f:
        encoder.add_text(f.read(), metadata={"source": file})

# Build optimized video
encoder.build_video(
    "knowledge_base.mp4",
    "knowledge_index.json",
    fps=30,  # Higher FPS = more chunks per second
    frame_size=512  # Larger frames = more data per frame
)

from memvid import MemvidRetriever

# Initialize retriever
retriever = MemvidRetriever("knowledge_base.mp4", "knowledge_index.json")

# Semantic search
results = retriever.search("machine learning algorithms", top_k=5)
for chunk, score in results:
    print(f"Score: {score:.3f} | {chunk[:100]}...")

# Get context window
context = retriever.get_context("explain neural networks", max_tokens=2000)
print(context)

from memvid import MemvidInteractive

# Launch interactive chat UI
interactive = MemvidInteractive("knowledge_base.mp4", "knowledge_index.json")
interactive.run()  # Opens web interface at http://localhost:7860

The examples/file_chat.py script provides a comprehensive way to test Memvid with your own documents:
examples/file_chat.py 脚本提供了一种使用你自己的文档来测试 Memvid 的全面方法：

# Process a directory of documents
python examples/file_chat.py --input-dir /path/to/documents --provider google

# Process specific files
python examples/file_chat.py --files doc1.txt doc2.pdf --provider openai

# Use H.265 compression (requires Docker)
python examples/file_chat.py --input-dir docs/ --codec h265 --provider google

# Custom chunking for large documents
python examples/file_chat.py --files large.pdf --chunk-size 2048 --overlap 32 --provider google

# Load existing memory
python examples/file_chat.py --load-existing output/my_memory --provider google

Custom Embeddings

from sentence_transformers import SentenceTransformer

# Use custom embedding model
custom_model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
encoder = MemvidEncoder(embedding_model=custom_model)

Video Optimization

# For maximum compression
encoder.build_video(
    "compressed.mp4",
    "index.json",
    fps=60,  # More frames per second
    frame_size=256,  # Smaller frames
    video_codec='h265',  # Better compression
    crf=28  # Compression quality (lower = better quality)
)

Distributed Processing

# Process large datasets in parallel
encoder = MemvidEncoder(n_workers=8)
encoder.add_chunks_parallel(massive_chunk_list)

🐛 Troubleshooting

Common Issues

ModuleNotFoundError: No module named ‘memvid’

# Make sure you're using the right Python
which python  # Should show your virtual environment path
# If not, activate your virtual environment:
source venv/bin/activate  # On Windows: venv\Scripts\activate

ImportError: PyPDF2 is required for PDF support

pip install PyPDF2

LLM API Key Issues

# Set your API key (get one at https://platform.openai.com)
export GOOGLE_API_KEY="AIzaSyB1-..."  # macOS/Linux
# Or on Windows:
set GOOGLE_API_KEY=AIzaSyB1-...

Large PDF Processing

# For very large PDFs, use smaller chunk sizes
encoder = MemvidEncoder()
encoder.add_pdf("large_book.pdf", chunk_size=400, overlap=50)

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

# Run tests
pytest tests/

# Run with coverage
pytest --cov=memvid tests/

# Format code
black memvid/

Feature	Memvid	Vector DBs	Traditional DBs
Storage Efficiency	⭐⭐⭐⭐⭐	⭐⭐	⭐⭐⭐
Setup Complexity	Simple	Complex	Complex
Semantic Search	✅	✅	❌
Offline Usage	✅	❌	✅
Portability	File-based	Server-based	Server-based
Scalability	Millions	Millions	Billions
Cost	Free	$$$$	$$$

📚 Examples

Check out the examples/ directory for:

Building memory from Wikipedia dumps
Creating a personal knowledge base
Multi-language support
Real-time memory updates
Integration with popular LLMs
📖 Documentation – Comprehensive guides
💬 Discussions – Ask questions
🐛 Issue Tracker – Report bugs
🌟 – Share your projects

🔗 Links

📄 License

MIT License – see LICENSE file for details.

🙏 Acknowledgments

Created by Olow304 and the Memvid community.

Built with ❤️ using:

sentence-transformers – State-of-the-art embeddings for semantic search
OpenCV – Computer vision and video processing
qrcode – QR code generation
FAISS – Efficient similarity search
PyPDF2 – PDF text extraction

Special thanks to all contributors who help make Memvid better!

Ready to revolutionize your AI memory management? Install Memvid and start building! 🚀

Releases 2

+ 1 release

Packages

No packages published

Post Views: 11

告别RAM爆炸！Memvid把百万文本块塞进视频，检索快过眨眼

Custom Embeddings

Video Optimization

Distributed Processing

🐛 Troubleshooting

Common Issues

🤝 Contributing

📚 Examples

🔗 Links

📄 License

🙏 Acknowledgments

Releases 2

Packages

By YXI.AI

Leave a Reply Cancel reply

You Missed

告别单一语音！Kokoro CLI语音合成：多语言文档直读，声音还能自由混搭

告别RAM爆炸！Memvid把百万文本块塞进视频，检索快过眨眼

终结AI工具记忆断层！OpenMemory实现跨平台无缝协作与90%Token节省

OpenSPG进化论：KAG如何定义下一代逻辑驱动型检索系统

Custom Embeddings

Video Optimization

Distributed Processing

🐛 Troubleshooting

Common Issues

🤝 Contributing

📚 Examples

🔗 Links

📄 License

🙏 Acknowledgments

Releases 2

Packages

By YXI.AI

Related Post

Leave a Reply Cancel reply

You Missed