Transparently proxy your LLM API calls. IR automatically deduplicates redundant context, trims RAG retrieval results, and compresses conversation history — change one line of code and start saving.
Change one line of code; IR transparently takes over your LLM requests
Sends a chat completion request
Optimizes context automatically
Fewer tokens, same quality
Fully compatible with the OpenAI SDK — no business-logic changes needed
client = OpenAI( api_key="sk-xxx", base_url="https://api.openai.com/v1" )
client = OpenAI( api_key="ir_live_xxx", base_url="https://ir.ngjoo.com/v1" ) # Rest of the code unchanged — same quality, lower cost
Cut cost and keep quality — end to end
Jaccard-similarity detection automatically removes duplicate text fragments in the message list — especially useful for repeated documents retrieved in RAG pipelines.
Intelligently trims retrieval results to keep only the chunks most relevant to the query, cutting irrelevant context that distracts the model and wastes tokens.
Sets a token ceiling on multi-turn conversations. Early turns are truncated or summarized on overflow — no more runaway token growth in long chats.
The dashboard shows live token usage, spending trends, and model distribution. Per-request cost is tracked to six decimal places.
Automatically detects and redacts sensitive data in requests — phone numbers, emails, ID numbers — with full audit trails, so you can meet compliance requirements.
Every config change creates a new version and supports one-click rollback. A quality check evaluates a new policy before it ships — safe and controllable.
One-line integration — results are immediate.
Crawls trending HuggingFace papers; an LLM produces summaries and categories, refreshed daily with the latest AI progress.
Upload a PDF; the model extracts key information and generates a mind map for deep paper reads and fast technical-document skims.
17 AI agents work together across HR, finance, IT operations, and more — natural-language-driven office workflows.
A tiered knowledge database built for AI agents — L0/L1/L2 summaries, MCP integration, and session memory.
Import Bilibili videos or upload local files. AI transcription + contextual Q&A + auto notes makes video learning more efficient.