heretic — What is it?

Heretic is an open-source tool designed to automatically remove censorship from transformer-based language models without the need for expensive post-training.

⭐ 23,358 Stars 🍴 2,494 Forks Python AGPL-3.0 Author: p-e-w
Source: README View on GitHub →

Why it matters

Heretic is gaining attention due to its innovative approach to censorship removal, which is both automatic and effective. It addresses the pain point of manual censorship removal, which is time-consuming and requires expertise. The project stands out for its use of directional ablation and TPE-based parameter optimization, which allows for high-quality abliteration without damaging the original model's intelligence.

Source: Synthesis of README and project traits

Core Features

Automatic Censorship Removal

Heretic automatically finds high-quality abliteration parameters by co-minimizing the number of refusals and KL divergence from the original model, resulting in a decensored model that retains much of the original model's intelligence.

Source: README
Support for Various Models

Heretic supports most dense models, including many multimodal models, several different MoE architectures, and even some hybrid models like Qwen3.5.

Source: README
Research Features

Heretic provides research features such as generating plots of residual vectors and printing details about residual geometry, which support the study of model semantics and interpretability.

Source: README

Architecture

The architecture of Heretic is modular, with distinct components for analysis, configuration, evaluation, and system management. It uses a TPE-based parameter optimizer and directional ablation techniques. The code structure is organized into a clear hierarchy, with a focus on maintainability and scalability.

Source: Code tree + dependency files

Project Knowledge Graph

Knowledge graph: project (center) + core features (inner hexagons) + key dependencies (outer chips) accelerate bitsandbytes datasets huggingface-hub immutabledict Automatic Censorship RemovalAutomatic Censorshi… Support for Various ModelsSupport for Various… Research Features heretic Project Core feature Key dependency

Center: project; inner ring: core feature modules; outer ring: key dependencies. Auto-generated from core_features and tech_stack.key_deps.

Tech Stack

LanguagePythonFrameworkPyTorch
acceleratebitsandbytesdatasetshuggingface-hubimmutabledictkernelslangdetectlm-eval[hf]numpyoptunapeftpsutilpy-cpuinfopydantic-settingsquestionaryrichtomli-wtqdmtransformers
Not enough information.
Source: Dependency files + code tree

Quick Start

Prepare a Python 3.10+ environment with PyTorch 2.2+ installed. Run `pip install -U heretic-llm` and then `heretic <model>` to decensor a model.
Source: README Installation/Quick Start

Use Cases

Heretic is suitable for developers and researchers working with language models who need to remove censorship without manual intervention. It is useful in scenarios where models need to generate responses on sensitive topics, such as political or social issues.

Source: README

Strengths & Limitations

Strengths

  • Strength 1: Fully automatic censorship removal process
  • Strength 2: Supports a wide range of models
  • Strength 3: Provides research features for model interpretability

Limitations

  • Limitation 1: Limited support for certain model architectures
  • Limitation 2: May require significant computational resources for large models
Source: Synthesis of README, code structure and dependencies

Latest Release

v1.3.0 (2026-05-05): Implemented reproducible runs.

Source: GitHub Releases

Verdict

Heretic is a promising project for those working with language models who need to remove censorship efficiently. It is particularly suitable for developers and researchers who require high-quality abliteration without manual intervention and are willing to invest in computational resources.

Source: Synthesis
Transparency Notice
This page is auto-generated by AI (a large language model) from the following public materials: GitHub README, code tree, dependency files and release notes. Analyzed at: 2026-06-02 18:31. Quality score: 85/100.

Data sources: README, GitHub API, dependency files