Released May 15, 2026

For You Algorithm
Deep Dive

A comprehensive, interactive guide to how X decides what appears in your For You feed — from raw candidates to ranked results.

5 stages
Pipeline stages
10+ filters
Content filters
19 signals
Engagement signals
2 sources
In + out of network
Explore the algorithm ↓

How Your Feed Gets Built

The For You feed blends posts from accounts you follow (in-network) with posts discovered through ML (out-of-network). Everything is ranked by Phoenix — a Grok-based transformer model that predicts how likely you are to engage with each post.

Thunder — In-Network

An in-memory post store that tracks recent posts from all users in real time. It serves posts from accounts you follow in sub-millisecond lookups via Kafka-ingested events.

Rust · In-Memory
🔭

Phoenix Retrieval — Out-of-Network

A two-tower ML model that encodes you and all posts into embedding vectors, then retrieves the most relevant out-of-network posts via approximate nearest-neighbour search.

JAX · Two-Tower
🧠

Phoenix Ranking — The Brain

A Grok-based transformer that reads your engagement history and scores every candidate post across 19 engagement types. The weighted sum becomes each post's final rank.

Transformer · Multi-action
🧹

Home Mixer — Orchestration

The glue layer written in Rust that wires together all pipeline stages: query hydration, candidate sourcing, enrichment, filtering, scoring, and final selection.

Rust · gRPC
🔍

Grox — Content Understanding

An AI pipeline that classifies every new post for spam, safety violations, and topic category using Grok-powered vision-language models before posts enter the ranking pool.

Python · Grok VLM
📐

Candidate Pipeline — Framework

A reusable Rust framework defining composable traits (Source, Filter, Scorer, Hydrator…) that run in parallel where possible with built-in observability and error handling.

Rust · Async

The 7-Stage Journey

Every feed request runs through these stages in sequence. Click any stage to expand details.

1
Query Hydration Parallel async
Load the user's full context before touching any candidates

All query hydrators run in parallel. Their results are merged back into the query object before the next stage begins.

UserActionSequence FollowedUserIds MutedUserIds BlockedUserIds UserDemographics ImpressionBloomFilter FollowedGrokTopics StarterPacks MutualFollowGraph ServedHistory
2
Candidate Sourcing Parallel
Pull posts from multiple sources simultaneously

Sources run in parallel and their results are pooled together into a single candidate list for the next stage.

IN-NETWORK
Thunder — sub-ms lookups from an in-memory store of every followed account's recent posts
OUT-OF-NETWORK
Phoenix Retrieval, Phoenix MoE, Phoenix Topics, Ads, Who-to-Follow prompts, Cached posts
3
Candidate Hydration Parallel async
Enrich candidates with the metadata needed for filtering and scoring

Hydrators fetch additional data and write it back to each candidate. They run in parallel since they don't depend on each other.

CoreData (text, media) AuthorInfo EngagementCounts VideoDuration SubscriptionStatus BrandSafety LanguageCode MutualFollowScore QuotePostExpansion AuthorBlocksViewer
4
Pre-Scoring Filters Sequential
Eliminate candidates that should never reach the scorer

Filters run one after another. Each partitions candidates into "kept" and "removed." Removed candidates are discarded (or tracked for logging) and never scored.

Removes: duplicates, posts older than the age threshold, your own posts, posts from blocked/muted accounts, previously seen posts, paywalled content you can't access, and muted keywords.

5
Scoring Sequential scorers
Predict engagement probabilities and compute the final rank score

Scorers run in order, each updating candidates with new fields. The full scoring chain is:

🧠
Phoenix Scorer
Sends all candidates to the Grok transformer. Gets back 19 engagement probability scores per post.
⚖️
Weighted Scorer
Combines the 19 probabilities into one score: Σ(weight × P(action)). Negative actions like block/report have negative weights.
🎭
Author Diversity Scorer
Applies an exponential decay multiplier to repeated authors. 2nd post from same author × decay, 3rd × decay², etc.
🌐
OON Scorer
Out-of-network posts get score × OON_WEIGHT_FACTOR. New users and topic-filtered feeds use different factors.
6
Selection
Sort by final score, pick the top K

The TopKScoreSelector sorts all surviving candidates by their final score (descending) and takes the top K. Non-selected candidates are passed to side effects for logging and caching.

7
Post-Selection Filters + Side Effects
Final safety pass, then log and cache for next request

VF Filter runs a visibility-filtering check and drops anything deleted, spam, violent, or gore since the pre-scoring stage. DedupConversation removes duplicate branches of the same thread.

Side effects run async in the background: caching scored posts in Redis, publishing served candidate IDs to Kafka, updating impression history, logging for A/B experiments.

System Architecture

Four major subsystems, each with a distinct responsibility and technology stack.

Home Mixer

Orchestration Layer

Written in Rust. Exposes a gRPC ScoredPostsService endpoint. Wires together all pipeline stages and owns the final response format.

  • Rust
  • gRPC / Tonic
  • Tokio async
Thunder

In-Network Post Store

Consumes post create/delete Kafka events in real time. Maintains per-user stores for original posts, replies/reposts, and video posts. Auto-trims old posts.

  • Rust
  • Kafka
  • In-Memory
Phoenix

ML Retrieval + Ranking

Two JAX models: a two-tower retrieval model (user + candidate towers) and a Grok-based transformer ranker. Ported from Grok-1, adapted for RecSys.

  • JAX / Haiku
  • Grok transformer
  • ANN search
Grox

Content Understanding

Python pipeline that classifies new posts for spam, safety violations, and topics using Grok's VLM. Powers embeddings and policy enforcement at ingest time.

  • Python
  • Grok VLM
  • Kafka
Candidate Pipeline

Reusable Framework

A Rust crate defining trait-based abstractions for building recommendation pipelines. Sources, Hydrators, Filters, Scorers, Selectors, and SideEffects.

  • Rust traits
  • Parallel exec
  • Stats / tracing
Ads Blending

Ad Injection

New in 2026. Blends ads into the organic feed at appropriate positions. Tracks brand-safety signals so ads don't appear adjacent to sensitive content.

  • Rust
  • Brand safety
  • Partition blend

How Posts Get Their Score

The Phoenix transformer predicts probabilities for 19 engagement types. These are combined via a weighted sum — with negative weights for actions you'd rather not see.

Score Simulator

Adjust engagement probabilities to see how the algorithm would score a post

✦ Positive Signals
FavoriteP(like) × weight
0.12
ReplyP(reply) × weight
0.04
RepostP(repost) × weight
0.05
ShareP(share) × weight
0.03
Click / DwellP(click + dwell) × weight
0.08
Follow AuthorP(follow) × weight
0.01
✗ Negative Signals
Not InterestedNegative weight
0.02
Block AuthorStrong negative
0.00
ReportStrongest negative
0.00
Estimated Score
0.33
Decent engagement predicted. This post would likely be included in the feed but won't rank in the top tier.
LowAverageTop
// The actual scoring formula from ranking_scorer.rs
let score = apply(p_fav, w_fav)
    + apply(p_reply, w_reply)
    + apply(p_repost, w_repost)
    + apply(p_share, w_share) // + 13 more...
    + apply(p_not_interested, -w_ni) // negative
    + apply(p_block_author, -w_block) // negative
    + apply(p_report, -w_report);// negative

// Author diversity: exponential decay for repeated authors
let multiplier = (1.0 - floor) * decay.powf(position) + floor;
let diversity_score = score * multiplier;

// OON penalty: out-of-network posts score × oon_weight_factor
let final_score = if !in_network { diversity_score * oon_factor } else { diversity_score };

What Gets Removed & Why

Filters run at two points: before scoring (to avoid wasting ML compute on ineligible posts) and after selection (for final safety checks).

Filter Stage What it removes
DropDuplicatesFilter Pre Candidate posts with duplicate IDs in the same request
CoreDataHydrationFilter Pre Posts that failed to hydrate core metadata (e.g., deleted before hydration)
AgeFilter Pre Posts older than the configured max age threshold
SelfTweetFilter Pre Posts authored by the viewing user (your own posts)
RetweetDeduplicationFilter Pre Multiple reposts pointing to the same original post
IneligibleSubscriptionFilter Pre Paywalled / subscription-only content the viewer hasn't subscribed to
PreviouslySeenPostsFilter Pre Posts the user has already seen, tracked via impression bloom filter
PreviouslyServedPostsFilter Pre Posts already served to the user in a recent prior request
MutedKeywordFilter Pre Posts containing any keyword or phrase the user has muted
AuthorSocialgraphFilter Pre Posts from blocked or muted accounts, or accounts that block the viewer; also covers quoted/retweeted authors
TopicIdsFilter Pre Posts that don't match the viewer's active topic filters
VideoFilter Pre Video posts below the minimum duration threshold
VFFilter Post Posts flagged as deleted, spam, violent content, or gore since the pre-scoring stage
DedupConversationFilter Post Duplicate branches of the same conversation thread to avoid showing the same thread multiple times
AncillaryVFFilter Post Visibility-filtered ancillary posts (quoted tweets, parent replies) attached to otherwise-visible candidates

The Grok-Powered Brain

Phoenix is a two-stage ML system: a two-tower retrieval model to narrow millions of posts to thousands, and a Grok-based transformer ranker to score each one with full context.

Stage 1 — Retrieval

Two-Tower Model

Encodes you and every post into a shared embedding space. Finds the top-K posts most similar to you via dot-product ANN search.

User Tower
Engagement history → user embedding
·
Item Tower
Post content → post embedding
Stage 2 — Ranking

Transformer with Candidate Isolation

Input: [user token] + [engagement history sequence] + [candidate posts]. Candidates attend to user + history but not to each other.

Input: [User] [H₁ H₂ … Hₛ] [C₁ C₂ … Cₙ]
Output: logits[B, n_candidates, 19 actions]

Candidate Isolation — The Attention Mask

Candidates can only attend to the user token and engagement history — never to each other. This means each post's score is independent of what else is in the batch, making scores cacheable and consistent across requests.

Can attend   Blocked   Self only

Hash-Based Embeddings

Neither the ranking nor retrieval model uses hand-crafted feature IDs. Instead, both users and posts are embedded via multiple independent hash functions:

User hashes — 2 independent hashes per user ID
Item hashes — 2 independent hashes per post ID
Author hashes — 2 independent hashes per author ID

Multiple hash embeddings are summed (reduced) before entering the transformer, providing collision resistance and graceful handling of unseen IDs.

AI-Powered Post Classification

Every post goes through Grox before it can be ranked. Grox uses Grok's vision-language model to understand text and images together — enabling nuanced classification that pure text models miss.

🚫

Spam Detection

Uses Grok VLM to classify posts as spam. Has a dedicated path for low-follower accounts (SpamEapiLowFollowerClassifier) that applies stricter standards to new or small accounts.

Safety
🛡️

Safety Screening

Two-pass safety system: a fast initial screen (BangerInitialScreen) followed by the full PostSafetyScreenDeluxe that evaluates PTOS (Policy, Terms of Service) categories.

Safety
📂

Content Classification

Classifies posts into categories to power topic-based feeds and filtering experiments. Supports post-based filtering at 90%, 75%, and 50% confidence thresholds.

Topics
🔢

Multimodal Embeddings

Generates dense embedding vectors for posts using both text and image content (v2 and v5 embedders). Used as features for the Phoenix retrieval model's candidate tower.

ML
📝

Post Summarization

Generates natural-language summaries of posts, used as additional input features to the Phoenix embedding pipeline for richer content understanding.

NLP
⚙️

Task Engine

A DAG-based task scheduler (grox/engine.py) that orchestrates classifiers, embedders, and publishers. Tasks declare dependencies and the engine resolves execution order.

Infrastructure

What This Means for You

Five things worth knowing about how the algorithm actually works — and what they imply for users and creators.

1

No hand-engineered features

The system has eliminated manual feature engineering entirely. The Grok transformer learns what matters directly from your engagement history. This means the algorithm adapts continuously to new content types and user behaviours without engineer intervention.

2

Your negative actions matter a lot

Block, mute, "not interested," and report all carry negative weights in the scoring formula. Using them actively trains the algorithm away from similar content. The scoring formula explicitly penalises posts you're predicted to dislike.

3

Author diversity is enforced algorithmically

The author diversity scorer applies an exponential decay to repeated authors sorted by score. Even if one account dominates your highest scores, later posts from that account get progressively smaller multipliers — ensuring your feed isn't flooded by a single creator.

4

Out-of-network posts are at a disadvantage

After the diversity step, out-of-network posts get score × OON_WEIGHT_FACTOR (less than 1). However, new users get a higher OON factor to help them discover content before building a follow graph, and topic feeds get their own OON factor.

5

Scores are consistent and cacheable

Candidate isolation in the transformer means a post's score doesn't change based on what else is in the request. This makes scored posts cacheable in Redis — so if you've already paid the ML cost for a post, its score can be reused in future requests.

6

Ads respect safety boundaries

The new ads blending system includes brand safety hydrators that track safety labels on organic content. Ads are not injected adjacent to content that violates brand safety thresholds, and the injection positions are validated against the organic post layout.

What's New

The May 15, 2026 open-source release is a major update. Here's what changed.

End-to-end pipeline

Unified inference entry point

A new phoenix/run_pipeline.py replaces separate retrieval and ranking scripts. It runs the full retrieval → ranking chain from a single command using exported checkpoints, matching how the two stages are composed in production.

Pre-trained artifacts

Run inference out of the box

A ~3 GB pre-trained mini Phoenix model (256-dim embeddings, 4 attention heads, 2 transformer layers) is packaged via Git LFS. You can run inference immediately without training a model from scratch, using a 537K sports-post demo corpus.

Grox content pipeline

Full content understanding service

The new grox/ directory adds classifiers, embedders, and a task engine for content understanding workloads: spam detection, topic classification, multimodal embeddings, and PTOS policy enforcement — all powered by Grok's VLM.

Ads blending

Transparent ad injection system

The home-mixer/ads/ module handles ad injection and positioning within the organic feed, including brand-safety tracking that prevents ads from appearing adjacent to sensitive content.

Query hydrators

Richer user context

New query hydrators populate followed topics, starter packs, impression bloom filters, IP addresses, mutual follow graphs, and served history — giving the ranker substantially richer user context.

Candidate hydrators

Richer post signals

Additional hydrators for engagement counts, brand safety signals, language codes, media detection, quote post expansion, and mutual follow Jaccard scores enrich each candidate before scoring.

Candidate sources

More content types

New sources for ads, Who-to-Follow recommendations, Phoenix MoE (Mixture-of-Experts), Phoenix Topics, and prompts are now included alongside the core Thunder and Phoenix retrieval sources.