Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
MemVault: Building a 3-Tier LLM Memory System That Cuts Token Costs by 56%
Learn how to build a 3-tier LLM memory system, MemVault, reducing token costs by 56% and providing persistent user knowledge. See real-time costs, AI knowledge visualization, and a smart model router.
I built MemVault, a complete, working 3-tier memory architecture for LLM applications that reduces token costs by 56% while giving AI persistent knowledge about users across sessions.
Live demo will show: the real-time cost dashboard tracking every token, the interactive D3.js graph visualizing exactly what the AI “knows” about a user, the smart model router switching between models automatically, and the full Redis, PostgreSQL, ChromaDB memory pipeline in action.
FastAPI-Streamlit Adaptive RAG system using LangGraph, Qdrant, and MongoDB.
- RedisRedis is the ultra-fast, open-source, in-memory data structure store: a powerful NoSQL key/value database.This is your go-to for low-latency data operations. Redis operates primarily in memory, delivering sub-millisecond response times for real-time applications (think: session storage, leaderboards, and caching). It functions as more than just a key/value store; it’s a versatile data structure server supporting Strings, Hashes, Lists, Sets, Sorted Sets, and JSON. Leverage its Pub/Sub capabilities for message brokering, or rely on its optional persistence for durability. Deploy it for high-speed caching to offload your primary database, or use it as a primary database for high-throughput microservices.
- PostgreSQLPostgreSQL (Postgres): The world's most advanced, open-source object-relational database (ORDBMS), built for reliability and extensibility.PostgreSQL is the premier open-source ORDBMS, proven over 35+ years of active development. It adheres strictly to ACID properties (Atomicity, Consistency, Isolation, Durability), ensuring data integrity for mission-critical workloads. Key features include robust SQL compliance, Multi-Version Concurrency Control (MVCC), and superior extensibility (e.g., custom data types, functions in multiple languages). Advanced capabilities like native JSON/JSONB support and the PostGIS extension (geospatial data) make it a powerful, flexible choice for complex enterprise applications.
- ChromaDBChromaDB is the open-source vector database built for LLM applications, providing simple, fast semantic search via embedding management.ChromaDB is your go-to open-source embedding database for building advanced LLM applications, specifically Retrieval-Augmented Generation (RAG) systems. It simplifies the storage, indexing, and querying of vector embeddings and their metadata, enabling fast Approximate Nearest Neighbor (ANN) similarity search. Developers appreciate its lightweight design: it runs in-memory, persistently using SQLite 3, or in client/server mode, offering robust Python and JavaScript SDKs. With over 5M+ monthly downloads and deep integration with tools like LangChain and LlamaIndex, Chroma is a proven, developer-friendly component for your AI stack.
- FastAPIFastAPI is a modern, high-performance Python web framework for building APIs with automatic OpenAPI documentation.FastAPI is a robust, high-speed Python web framework: it is built on Starlette (for async capabilities) and Pydantic (for data validation and serialization). Leveraging standard Python 3.8+ type hints, the framework automatically generates interactive API documentation (Swagger UI/ReDoc) and enforces data validation, effectively reducing developer-induced errors by an estimated 40%. This architecture delivers performance on par with Node.js and Go, significantly increasing feature development speed (up to 300% faster). It is production-ready, fully supporting OpenAPI and JSON Schema standards for all API specifications.
- GroqGroq delivers ultra-fast AI inference using its custom-built Language Processing Unit (LPU) to accelerate Large Language Models (LLMs) at scale.Groq specializes in high-speed AI inference, leveraging its proprietary Language Processing Unit (LPU) Inference Engine: a chip specifically architected for generative AI and LLMs. The LPU's unique dataflow architecture bypasses the memory and compute bottlenecks of traditional GPUs, delivering consistent, ultra-low-latency performance and superior energy efficiency. This technology, accessible via the GroqCloud platform or on-premise GroqRack clusters, enables real-time application deployment for demanding enterprise customers. Founded in 2016 by former Google engineers (including a lead designer of the TPU), Groq is setting the new standard for real-time AI compute.