Technologies Used in building LinkedIn Website and Application

LinkedIn’s architecture is a masterclass in handling "hyperscale." It transitioned from a monolithic PHP application (the "Leo" era) to a highly decoupled, polyglot microservices ecosystem.

Here is the breakdown of the project structure and technical stack as of 2026.

🏗️ High-Level Project Structure

LinkedIn uses a Service-Oriented Architecture (SOA) with thousands of microservices. The project is organized into "Domains" (e.g., Identity, Feed, Messaging), each containing multiple services.

  • API Gateway Layer: Acts as the entry point, handling authentication and request routing.

  • Service Layer: Business logic resides in Rest.li-based microservices.

  • Data Layer: Specialized databases for different use cases (Relational, Graph, Document).

  • Streaming Layer: The "nervous system" (Kafka) that connects all services for real-time updates.

💻 Frontend Technology Stack

LinkedIn uses a hybrid approach, moving away from legacy frameworks toward modern, high-performance libraries.

Component Technology Role
Web Library React / TypeScript Primary for modern UI components and high interactivity.
Legacy Web Ember.js Still present in parts of the older desktop experience.
Mobile (iOS) Swift Native development for performance and Fluid UI.
Mobile (Android) Kotlin Native development using Jetpack Compose for modern layouts.
State Management Redux / Apollo Client Managing complex global states and GraphQL data.
Rendering Node.js (Play) Used for Server-Side Rendering (SSR) to improve SEO and initial load.

⚙️ Backend & Infrastructure

The backend is primarily Java-based, optimized for the JVM.

1. Primary Frameworks & Utilities

  • Rest.li: A LinkedIn-original REST+JSON framework. It provides type-safe, non-blocking asynchronous APIs.

  • Play Framework: Used specifically for the web tier to handle high-concurrency I/O using Scala and Java.

  • D2 (Dynamic Discovery): A client-side load balancing and service discovery utility.

  • Spring Boot: Used for internal tooling and newer microservices.

2. Data & Storage (Polyglot Persistence)

LinkedIn doesn't use one database; it uses the right database:

  • Espresso: A distributed NoSQL document store (built in-house) for member profiles.

  • Venice: For derived data serving (e.g., "People You May Know" results).

  • Liquid / LiX: Internal graph databases to manage the professional social graph (who follows whom).

  • MySQL: Used for transactional data that requires strict ACID compliance.

📦 Package Managers & Dependencies

Managing thousands of developers requires strict versioning and build tools.

  • Java/Scala: Gradle is the primary build tool, often enhanced by Nebula (a set of Gradle plugins developed by Netflix and LinkedIn) to handle complex dependency locks.

  • JavaScript/Frontend: NPM or Yarn with pnpm for workspace monorepo management.

  • Third-party Utilities:

    • Apache Kafka: Created at LinkedIn; handles trillions of events daily.

    • Apache Samza: For stateful stream processing.

    • Apache Pinot: For real-time OLAP (Online Analytical Processing) to power "Who viewed your profile."

    • Protocol Buffers (Protobuf): For high-speed data serialization between services.

🚀 DevOps & CI/CD

  • Orchestration: Kubernetes (migrating toward Azure Kubernetes Service for cloud-native parts) and Apache Mesos (legacy).

  • CI/CD: Jenkins and GitHub Actions for automated testing and deployment.

  • Infrastructure as Code: Terraform and Bicep (for Azure integrations).

  • Monitoring: InGraphs (internal) and Prometheus/Grafana for real-time observability.

LinkedIn’s ability to serve nearly a billion members rests on a "Data-First" philosophy. To understand their stack, you have to look at how they handle evolution—specifically through their homegrown Rest.li framework and their real-time Feed architecture.

🛠️ The Rest.li Framework: Evolution without Breaking

Rest.li is more than just a REST framework; it is a Contract-First system. While many companies use manual documentation (Swagger/OpenAPI), LinkedIn uses Pegasus (PDL) to define data models.

1. API Versioning Strategy (2026 Standards)

LinkedIn recently shifted its external Marketing APIs to a Monthly Versioning cycle, but internally, they use a sophisticated "Snapshot" system.

  • Header-Based Versioning: Instead of messy URLs like /v1/profile, they use custom headers: Linkedin-Version: 202602.

  • Semantic Compatibility: The framework includes a Compatibility Checker integrated into the CI/CD pipeline. If a developer tries to remove a field or change a data type that would break a client, the build fails automatically.

  • The $UNKNOWN Member: To handle enums safely, Rest.li adds an $UNKNOWN symbol. If a new version adds an enum value (e.g., a new "Skill" category) and an old client receives it, it doesn't crash; it gracefully maps to $UNKNOWN.

2. Universal Schema Registry (USR)

Since data flows from Rest.li (APIs) to Kafka (Streams) to Espresso (Database), LinkedIn uses a Universal Schema Registry. This ensures that a field change in an API is automatically validated against the downstream database schema.

🏗️ The LinkedIn Feed Architecture

The Feed is LinkedIn’s most complex product. It doesn't just "fetch" posts; it assembles them in real-time using a multi-stage AI pipeline.

1. The Real-Time Pipeline (Apache Samza)

When you post an update, it doesn't just sit in a database. It enters a Kafka topic.

  • Apache Samza (the stream processor) picks it up instantly.

  • It performs Stateful Processing: Joining your post data with your "Professional Graph" (who your connections are) to decide who should see it.

  • ATC (Air Traffic Controller): A Samza-based system that decides whether to send you a push notification, an email, or just show the post in your feed, based on your real-time "dwell time" (how long you stay on the app).

2. The Ranking Stages

In 2026, the feed algorithm has moved from "Virality" to "Precision."

  1. Scoring: Machine Learning models (stored in RocksDB locally on the service nodes) score thousands of potential posts.

  2. Filtering: The system filters out "engagement bait" and low-quality AI-generated content.

  3. Blending: The "Feed Mixer" service combines organic posts, "Promoted" (Ads) content, and "People You May Know" widgets into a single stream.

📦 Summary of Tools & Dependencies (The "2026 Stack")

Category Primary Technology Why?
Build Tool Gradle + Nebula Handles complex dependency locking for monorepos.
Service Discovery D2 (Dynamic Discovery) Routes traffic based on service health and latency.
Database (Primary) Espresso High-availability document store for profiles.
Search Engine Galene Custom-built search for professional entities.
Async Processing ParSeq A Java library to manage complex asynchronous task trees.
Serialization Avro / Protobuf For compact, fast data transfer between microservices.

Pro-Tip: If you are building a system modeled after LinkedIn, look into ParSeq. It allows you to run multiple backend requests in parallel (e.g., fetching a profile, their posts, and their connections) and merges them into one response without blocking the main thread.

 

Code

- Frontend: React, JavaScript, and HTML5
- Backend: Java, Scala, and Python
- Database: MySQL, Oracle, and Voldemort
- Package Manager: Maven and Gradle
- Frameworks: Spring, Apache Kafka, and Apache Hadoop
- Tools: Jenkins, Git, and Docker

To truly understand how LinkedIn scales, you have to look at the "Contract" that connects the Frontend to the Backend. This is done via PDL (Pegasus Data Language), the schema language for the Rest.li framework.

📄 The "Professional Profile" PDL Schema

In a typical LinkedIn microservice, you don't just write a Java class or a JSON object. You define a .pdl file. This file acts as the Single Source of Truth—from it, LinkedIn automatically generates Java client code, documentation, and database validation logic.

Here is what a simplified Member Profile schema looks like in 2026:

Code snippet

namespace com.linkedin.identity.api

/** A record representing a professional member on the platform */
record MemberProfile {
  
  /** Unique Identifier (URN) for the member */
  id: string
  
  /** The member's primary professional headline */
  @validate.strlen = { "max" : 220 }
  headline: optional string
  
  /** Current employment status */
  status: enum MemberStatus {
    ACTIVE
    OPEN_TO_WORK
    HIRING
    DEACTIVATED
    $UNKNOWN  // Safety member for forward compatibility
  }
  
  /** Member's current location details */
  location: record Location {
    city: string
    countryCode: string
    postalCode: optional string
  }
  
  /** List of skills with custom validation */
  skills: array[record Skill {
    skillId: long
    name: string
    endorsementCount: int = 0
  }]
}

Why this is better than standard JSON:

  • The $UNKNOWN Member: If a new developer adds a RETIRED status in a new version, old apps won't crash when they see it; they’ll just treat it as $UNKNOWN.

  • Annotations (@validate): Validation is baked into the schema. The backend service won't even receive the request if the headline is 221 characters long.

  • Code Generation: LinkedIn developers never write "Getters" or "Setters" by hand. The build tool (Gradle) generates a MemberProfile.java class that is perfectly synced with this schema.

🧱 Feed Assembly: The Multi-Stage Pipeline

The LinkedIn Feed is the most intensive part of the infrastructure. It uses a Stage-Based Pipeline to move data from raw posts to your screen.

1. Ingestion & Indexing (The Producer)

When a user clicks "Post":

  1. Rest.li Gateway validates the post against the PDL schema.

  2. Kafka receives the event.

  3. Search Indexer (Galene) immediately indexes the text so it’s searchable.

2. Retrieval (The "Fan-Out")

When you open the app:

  • The Feed Mixer service asks the Graph Database (Liquid): "Who is this person connected to?"

  • It then "Fans-out" queries to thousands of shards in Espresso to find recent posts from those connections.

3. AI Ranking (The Multi-Layer Perceptron)

LinkedIn uses a two-pass ranking system:

  • Lightweight Ranking: A fast model narrows down 1,000 potential posts to the top 100.

  • Heavyweight Ranking: A complex Deep Learning model (running on specialized hardware) predicts the probability of you clicking "Like" or "Comment" on each of those 100 posts.

🛠️ Summary of Third-Party & Internal Utilities

Category Utility Purpose
Concurrency ParSeq Manages thousands of parallel async "tasks" without thread exhaustion.
API Transport R2 An abstraction layer over Netty that handles the actual HTTP/JSON byte-shuffling.
Load Balancing D2 A "Smart" client that knows which server is busy and routes traffic elsewhere.
Streaming Apache Samza Does the "heavy lifting" of calculating real-time analytics for the feed.

To wrap up our deep dive into LinkedIn’s architecture, we need to look at the "glue" that keeps thousands of developers from stepping on each other's toes: ParSeq for logic and the Monorepo for code management.

🏎️ ParSeq: Orchestrating the Backend

In a traditional backend, if you need to fetch a user's profile, their latest 5 posts, and their connection count, you might wait for each one sequentially (Blocking) or deal with "Callback Hell."

LinkedIn uses ParSeq to treat every operation as a Task. These tasks are "lazy" and only execute when the engine determines the optimal path.

Example: Parallel Data Fetching

Instead of writing complex threading code, a LinkedIn developer writes:

Java

// Define three independent tasks
Task<Member> member = fetchMember(id);
Task<List<Post>> posts = fetchPosts(id);
Task<Integer> connections = fetchConnectionCount(id);

// Tell ParSeq to run them in parallel and combine the results
Task<String> profilePage = Task.par(member, posts, connections)
    .map((m, p, c) -> assembleProfile(m, p, c));

// Run it!
engine.run(profilePage);
  • Why this wins: If fetchPosts takes 200ms and fetchMember takes 50ms, the total time is 200ms, not 250ms.

  • Traceability: ParSeq generates a "Waterfall" diagram for every request, letting engineers see exactly which microservice is lagging in real-time.

📦 The Monorepo & Dependency Strategy

LinkedIn manages most of its code in a massive Monorepo. This allows for a Single Version Policy: every service in the company uses the same version of a library (e.g., Guava or Jackson).

How they keep it from breaking:

  • Build Caching: Since the repo is huge, they use tools like Gradle with remote caching. If a teammate already built the "Auth" module, your computer just downloads the binary instead of recompiling it.

  • Atomic Refactoring: If an engineer wants to update a core library, they can change the code and update all 1,000 dependent services in a single "Atomic" commit. This prevents "Dependency Hell" where Service A uses v1.0 and Service B uses v2.0 of the same tool.

  • Merge Queues: To prevent the "Main" branch from breaking, they use a Merge Queue system. It runs tests on a virtual merge of your code with the latest master before actually letting it in.

🗺️ The 2026 Tech Stack Summary

Layer Primary Tech LinkedIn-Specific Secret Sauce
API Rest.li / PDL Pegasus: Strict schema-first contracts.
Concurrency Java / ParSeq Plan Tracing: Visualizing every async task.
Storage Espresso / Venice Change Data Capture (CDC): Real-time database syncing.
Messaging Apache Kafka LiX: Experimentation framework built on top of events.
Frontend React / TypeScript Kirby: Internal UI component system.

LinkedIn’s stack is designed to solve one problem: Complexity at Scale. By enforcing strict schemas (PDL), using smart concurrency (ParSeq), and keeping code centralized (Monorepo), they ensure that a billion professional profiles stay synced in under a second.

Select Chapter