parlor — What is it?

Parlor is an on-device, real-time multimodal AI that facilitates natural voice and vision conversations with an AI, eliminating server costs and enhancing privacy.

⭐ 1,250 Stars 🍴 125 Forks HTML Apache-2.0 Author: fikrikarim
Source: README View on GitHub →

Why it matters

Parlor is gaining attention due to its innovative approach to on-device AI, addressing privacy concerns and server costs. It leverages cutting-edge models like Gemma 4 E2B and Kokoro, making it a game-changer for language learning and real-time interaction.

Source: README, project traits

Core Features

Multimodal AI

Parlor combines voice and vision capabilities, allowing users to have conversations with an AI using both speech and visual inputs.

Source: README
On-device processing

The AI runs entirely on the user's machine, ensuring privacy and eliminating the need for server infrastructure.

Source: README
Real-time interaction

Parlor supports real-time processing of voice and vision data, providing immediate responses to user inputs.

Source: README
Voice Activity Detection

Integrated voice activity detection allows for hands-free operation without the need for push-to-talk.

Source: README
Sentence-level TTS streaming

The text-to-speech feature streams audio before the full response is generated, enhancing the user experience.

Source: README

Architecture

Parlor's architecture is a client-server model, with a browser-based frontend communicating via WebSocket with a FastAPI server. The server uses Gemma 4 E2B for speech and vision understanding and Kokoro for text-to-speech. The project employs a modular design with separate components for the server, text-to-speech, and frontend UI.

Source: Code tree + README

Tech Stack

infra: Local machine with macOS Apple Silicon or Linux GPU  |  key_deps: Gemma 4 E2B, Kokoro, LiteRT-LM, Silero VAD  |  language: Python  |  framework: FastAPI for the server, HTML for the frontend UI

Source: Code tree, README

Quick Start

git clone https://github.com/fikrikarim/parlor.git cd parlor # Install uv if you don't have it curl -LsSf https://astral.sh/uv/install.sh | sh cd src uv sync uv run server.py Open http://localhost:8000, grant camera and microphone access, and start talking. Models are downloaded automatically on first run (~2.6 GB for Gemma 4 E2B, plus TTS models).
Source: README Installation/Quick Start

Use Cases

Parlor is suitable for language learning, interactive AI applications, and privacy-conscious users who prefer on-device processing over cloud-based solutions.

Source: README

Strengths & Limitations

Strengths

  • Strength 1: Enhances privacy by processing data on the user's device.
  • Strength 2: Reduces server costs and infrastructure requirements.
  • Strength 3: Supports real-time interaction and multimodal communication.

Limitations

  • Limitation 1: Requires a powerful local machine with a supported GPU.
  • Limitation 2: May have limitations in terms of AI capabilities compared to cloud-based solutions.
Source: Synthesis of README, code structure and dependencies

Latest Release

Not enough information.

Source: GitHub Releases

Verdict

Parlor is a promising project for developers and users interested in on-device AI solutions. It offers a unique combination of privacy, real-time interaction, and multimodal capabilities, making it particularly valuable for language learning and interactive applications. It is best suited for technically inclined individuals or teams with access to powerful local hardware.

Transparency Notice
This page is auto-generated by AI (a large language model) from the following public materials: GitHub README, code tree, dependency files and release notes. Analyzed at: 2026-04-19 10:25. Quality score: 85/100.

Data sources: README, GitHub API, dependency files