coding

Local LLM Screens Gmail for Smart Notifications

A developer built an open-source system using a locally-run large language model to intelligently filter Gmail and send notifications only for important

Local LLM Filters Gmail for Priority Notifications

What It Is

A developer has created an open-source system that runs a large language model locally to scan Gmail and send notifications only for messages that matter. The project, available at https://github.com/IngeniousIdiocy/LocalLLMMailScreener, uses Qwen3 235B quantized to 8-bit precision running on a Node.js server. Instead of relying on simple keyword filters or cloud-based AI services, the system processes emails entirely on local hardware, analyzing message content with natural language understanding to determine which emails warrant immediate attention.

The screener operates continuously in the background, checking Gmail at regular intervals. When new messages arrive, the local LLM evaluates them against custom prompts that define what counts as “priority.” Messages matching these criteria trigger push notifications to a mobile device, while routine newsletters, promotional emails, and low-priority correspondence remain silent until manual review.

Why It Matters

Email overload has become a genuine productivity problem. Most professionals receive dozens or hundreds of messages daily, making it impractical to enable notifications for every incoming email. Traditional filtering relies on sender addresses or basic keyword matching, which frequently fails to catch important messages from new contacts or unexpected sources. A school cancellation notice from an unfamiliar email address, for instance, might slip through conventional filters.

Running LLM-based screening locally addresses a critical privacy concern that cloud-based solutions cannot. When Gmail content passes through third-party APIs for analysis, sensitive business communications, medical information, or personal correspondence becomes exposed to external services. Local processing keeps all email content on hardware under direct control, eliminating this exposure entirely.

The approach also demonstrates practical applications for quantized models. Qwen3 235B at 8-bit precision requires substantial RAM but remains feasible on high-end consumer hardware like a Mac Studio. This represents a middle ground between cloud dependency and accessibility - more demanding than running smaller models, but achievable without enterprise infrastructure.

Getting Started

Setting up the screener requires several configuration steps. First, clone the repository:

The system needs Gmail API credentials to access email. Developers must create a project in Google Cloud Console, enable the Gmail API, and download OAuth 2.0 credentials. The repository documentation provides specific instructions for this authentication setup.

Custom screening prompts define what qualifies as priority. These prompts might specify criteria like “messages about children’s school activities,” “emails requiring response within 24 hours,” or “communications from specific project stakeholders.” The LLM evaluates each message against these natural language descriptions rather than rigid rules.

Hardware requirements are significant. The 8-bit quantized Qwen3 235B model demands considerable RAM, making this approach most practical for machines with 64GB or more. Teams with less powerful hardware might experiment with smaller models, though this trades some comprehension capability for reduced resource consumption.

Context

Cloud-based alternatives like Gmail’s built-in priority inbox or services using GPT-4 API calls offer simpler deployment. These solutions require minimal local resources and work across devices without dedicated hardware. However, they necessarily expose email content to external processing, which may violate privacy policies or personal preferences for sensitive communications.

Traditional rule-based filters remain the most resource-efficient option but lack semantic understanding. A filter catching emails with “urgent” in the subject line misses messages that are time-sensitive without explicit urgency markers.

The main limitation of local LLM screening is hardware dependency. Running large models locally creates a single point of failure - the system only functions when the host machine is powered on and connected. This differs from cloud solutions that operate continuously regardless of personal device status. Teams considering this approach should evaluate whether the privacy benefits justify the infrastructure requirements and reduced flexibility compared to cloud-based alternatives.