Secure Isolation by Design: How SSoC 2026 Protects 50,000 Contributors Without a Production Database

Over the past few weeks, several large open-source programmes have experienced data exposure incidents — contributor email addresses, phone numbers, and personal metadata leaked through insecure database configurations and overly permissive API endpoints. The developer community has rightly asked difficult questions about how programmes handling tens of thousands of student developers should be architecting their systems.

Sad Hacker

Contributors to Social Summer of Code (SSoC) 2026 have asked me the same question:

"SSoC collects our email addresses and phone numbers for onboarding and certification. How do you prevent the kind of database leaks we've recently seen elsewhere?"

This article is my answer. Not a marketing piece. Not a "we're unhackable" press release. A genuine engineering deep-dive into the architectural decisions that shape how SSoC handles contributor data — and the philosophy behind those decisions.

The central thesis is simple:

Security is not primarily about encryption, firewalls, or databases. Security starts with architecture. The best defence is often ensuring sensitive data never reaches systems that don't need it.

I call this philosophy Secure Isolation by Design.

The Threat Model

Before discussing architecture, let's establish what we're defending against. SSoC collects the following PII (Personally Identifiable Information) during registration:

Data Type	Purpose	Sensitivity
Full Name	Display, certificates	Low
Email Address	Onboarding, notifications, certificates	High
Phone Number	Emergency contact, WhatsApp groups	High
GitHub Username	Scoring, attribution	Low (public)
LinkedIn URL	Networking, verification	Low (public)
Discord Username	Community communication	Medium

The threat model is straightforward:

External attackers probing public infrastructure for exposed databases, APIs, or admin panels
Accidental exposure through misconfigured deployments, leaked environment variables, or overly broad API responses
Supply chain risks from dependencies with access to runtime data
Insider risk from overly broad access to administrative systems

The typical response to these threats is defence-in-depth: encrypt the database, add authentication, set up WAF rules, rotate keys, monitor access logs. All of those are valid. But they all assume the sensitive data is in the system in the first place.

What if it isn't?

Architecture Overview

SSoC 2026 serves hundreds of open-source projects, real-time leaderboards, contributor profiles, certificate verification, and administrative tooling for a programme with 50,000+ contributors. Here's how the system is structured:

mermaid

graph TB
    subgraph "Public Internet"
        USER[Contributors / Public]
    end

    subgraph "Public Platform (Static Site)"
        SITE[Static Website<br/>React + Vite]
        LB_DATA[user-scores.js<br/>Leaderboard Data]
        PR_DATA[pr-list.js<br/>PR Details]
        PROJ_DATA[projects.json<br/>Project Metadata]
        CERT_DATA[Certificate Data]
    end

    subgraph "Build Pipeline (Offline)"
        SCORING[Node.js Scoring Engine<br/>single.js]
        GH_API[GitHub GraphQL API<br/>Public PR Data]
        BONUS[Bonus Points Repo<br/>GitHub]
    end

    subgraph "Administrative Layer (Local Only)"
        GC[Ground Control<br/>Admin Interface]
        MASTER[Master TSV<br/>Registration Data]
        LOCAL_SCRIPTS[Local Processing<br/>Validation Scripts]
    end

    USER -->|HTTPS| SITE
    SITE --- LB_DATA
    SITE --- PR_DATA
    SITE --- PROJ_DATA
    SITE --- CERT_DATA

    GH_API -->|Public metadata| SCORING
    BONUS -->|Bonus points| SCORING
    SCORING -->|Generated static files| LB_DATA
    SCORING -->|Generated static files| PR_DATA

    MASTER -->|Local fetch only| GC
    MASTER -->|Local input| LOCAL_SCRIPTS

    style MASTER fill:#ff6b6b,stroke:#c0392b,color:#fff
    style GC fill:#ff6b6b,stroke:#c0392b,color:#fff
    style LOCAL_SCRIPTS fill:#ff6b6b,stroke:#c0392b,color:#fff
    style SITE fill:#2ecc71,stroke:#27ae60,color:#fff
    style LB_DATA fill:#2ecc71,stroke:#27ae60,color:#fff
    style PR_DATA fill:#2ecc71,stroke:#27ae60,color:#fff
    style PROJ_DATA fill:#2ecc71,stroke:#27ae60,color:#fff

The red components never touch the public internet. The green components contain zero PII. The two layers share no runtime connection.

1. The Public Platform: A Database-Free Architecture

The SSoC public website — the portal that 50,000+ contributors interact with — is a static React application built with Vite. It has:

No backend server — no Express, no Fastify, no serverless functions
No database — no MongoDB, no PostgreSQL, no Firebase Realtime Database
No authentication layer — no JWT tokens, no session cookies, no OAuth flows
No API endpoints — no REST routes, no GraphQL resolvers, no WebSocket connections

Everything the public sees is pre-generated.

How Leaderboards Work Without a Database

The scoring engine is a Node.js script (single.js) that runs offline. It:

Queries the GitHub GraphQL API for public PR metadata across all programme repositories
Evaluates each PR against scoring rules (difficulty labels, blacklists, registered users)
Fetches bonus points from a separate GitHub repository
Generates static JavaScript files containing the computed results

javascript

// Output: user-scores.js (loaded via <script> tag)
window.userScores = {
  users: {
    "contributor-a": {
      totalScore: 450,
      prCount: 12,
      prsByLevel: { Easy: 3, Medium: 5, Hard: 2, Advanced: 2 },
      bonusScore: 50,
      // No email. No phone. No PII.
    },
    // ... 50,000 more entries
  },
  summary: { totalPRs: 87432, totalPoints: 2156000 }
};

The output files are committed and deployed as static assets. The website loads them via <script> tags — no fetch calls, no API requests, no database queries.

"The safest database query is the one you never have to make."

This isn't a limitation. It's the design. A static architecture means:

Attack Vector	Typical Web App	SSoC Public Platform
SQL/NoSQL Injection	Possible	Not applicable — no database
API enumeration	Possible	Not applicable — no API
Authentication bypass	Possible	Not applicable — no auth layer
Session hijacking	Possible	Not applicable — no sessions
Server-side request forgery	Possible	Not applicable — no server
Database credential leak	Possible	Not applicable — no credentials
Exposed admin API	Possible	Not applicable — no API

The attack surface isn't "well-defended." It largely doesn't exist.

What About Dynamic Features?

The platform has interactive features — search, filtering, score breakdowns, certificate validation, PR lookup. All of these operate entirely client-side against the pre-generated data. The search bar on the Projects page doesn't query a database; it filters a JSON array already loaded in the browser.

tsx

// Client-side search — no server round-trip
const filtered = projects.filter(p =>
  p.name.toLowerCase().includes(query.toLowerCase()) ||
  p.owner.toLowerCase().includes(query.toLowerCase()) ||
  p.techStack.some(t => t.toLowerCase().includes(query.toLowerCase()))
);

2. PII Separation: The Data That Never Deploys

SSoC collects contributor PII during registration through Google Forms. This data flows into a Google Sheet, which is exported as a TSV (Tab-Separated Values) file for administrative processing.

Here's the critical architectural decision: this file never enters the deployment pipeline.

mermaid

graph LR
    subgraph "Registration Flow"
        FORM[Google Form] -->|Responses| SHEET[Google Sheet]
        SHEET -->|Manual TSV export| LOCAL[Local Machine<br/>MasterSheetsData.tsv]
    end

    subgraph "Public Deployment"
        BUILD[Vite Build] -->|Static assets| CDN[GitHub Pages / CDN]
    end

    LOCAL -.-x|NEVER| BUILD

    style LOCAL fill:#ff6b6b,stroke:#c0392b,color:#fff
    style CDN fill:#2ecc71,stroke:#27ae60,color:#fff

The TSV contains email addresses, phone numbers, LinkedIn URLs, and role information. It exists on my local machine. It is listed in .gitignore. It is never committed to the repository. It is never included in the build output. It is never uploaded to any server.

Data Minimisation in Practice

The scoring engine needs to know which GitHub usernames are registered participants — but it doesn't need their email addresses or phone numbers to calculate scores. So the input to the scoring engine is a minimal users.json file:

json

["contributor-a", "contributor-b", "contributor-c"]

Just usernames. Generated from the Master TSV by extracting a single column. The scoring engine never sees the full registration data.

This is GDPR's data minimisation principle in practice: each system component receives only the minimum data required for its specific function.

3. Ground Control: The Admin Interface That Can't Be Replicated

Ground Control is the internal administrative interface for SSoC. It provides:

Contributor validation (name formatting, email checks, phone number cleanup)
Duplicate detection across registration entries
PR cross-referencing with scoring data
Diagnostic panels for data quality
Discord ID and GitHub username lookup

The route exists in the React application at /ground-control. It ships in the production bundle. You can navigate to it right now. But here's what happens when you do:

javascript

// Ground Control's data loading
useEffect(() => {
  fetch("/LocalData/MasterSheetsData.tsv")
    .then(res => res.text())
    .then(text => parseTSV(text))
    .catch(() => setError("Data not available"));
}, []);

It tries to fetch MasterSheetsData.tsv from a local path. On the production deployment, that file doesn't exist. The fetch returns a 404. Ground Control renders an empty state. There's nothing to see.

This is not security through obscurity. The route isn't hidden (it's linked from the Tools page). The code isn't obfuscated. The approach is simpler than that: the administrative interface depends on data that is architecturally absent from the public deployment. Even if you find the route, read the source code, and understand exactly how it works, you cannot reproduce the administrative workflow because the underlying dataset isn't there.

mermaid

graph TB
    subgraph "Production Deployment"
        ROUTE[/ground-control Route]
        FETCH[fetch /LocalData/MasterSheetsData.tsv]
        FOUR04[404 Not Found]
        EMPTY[Empty State UI]
    end

    subgraph "Local Development"
        ROUTE_L[/ground-control Route]
        FETCH_L[fetch /LocalData/MasterSheetsData.tsv]
        DATA[MasterSheetsData.tsv<br/>Email, Phone, PII]
        FULL[Full Admin Interface]
    end

    ROUTE --> FETCH --> FOUR04 --> EMPTY
    ROUTE_L --> FETCH_L --> DATA --> FULL

    style FOUR04 fill:#e74c3c,stroke:#c0392b,color:#fff
    style DATA fill:#ff6b6b,stroke:#c0392b,color:#fff
    style EMPTY fill:#95a5a6,stroke:#7f8c8d,color:#fff
    style FULL fill:#2ecc71,stroke:#27ae60,color:#fff

Why Not Remove the Route Entirely?

A reasonable question. The answer is developer workflow. Ground Control is used during local development and programme operations. Maintaining a separate build configuration to strip it from production adds complexity and creates a divergence between development and production builds — which introduces its own class of bugs. The simpler, more reliable approach: let the route exist, ensure it has no data to display.

4. Local-First Administration

Administrative processing — the work that actually touches PII — happens on my local machine. Not on a server. Not in a cloud function. Not behind an authenticated API. Locally.

This includes:

Email validation — regex checks, domain verification, duplicate detection
Phone number formatting — international format normalisation, country code validation
LinkedIn URL cleanup — extracting usernames from various URL formats
Name formatting — Title Case normalisation, Unicode handling
CSV/TSV processing — cross-referencing registration data with raid completions, scoring data
Certificate generation — batch processing with PII for personalisation

mermaid

graph LR
    subgraph "Local Machine (Air-Gapped from Public)"
        TSV[Master TSV<br/>Full PII]
        SCRIPTS[Processing Scripts]
        VALIDATION[Validation Output]
        CERTS[Certificate Data]
    end

    subgraph "Outputs (PII-Free)"
        USERS[users.json<br/>Usernames only]
        PROJECTS[projects.json<br/>Public metadata]
        SCORES[user-scores.js<br/>Scores only]
    end

    TSV --> SCRIPTS
    SCRIPTS --> VALIDATION
    SCRIPTS --> USERS
    SCRIPTS --> PROJECTS
    SCRIPTS --> SCORES

    style TSV fill:#ff6b6b,stroke:#c0392b,color:#fff
    style SCRIPTS fill:#f39c12,stroke:#e67e22,color:#fff
    style USERS fill:#2ecc71,stroke:#27ae60,color:#fff
    style PROJECTS fill:#2ecc71,stroke:#27ae60,color:#fff
    style SCORES fill:#2ecc71,stroke:#27ae60,color:#fff

Why Local Beats Cloud for Administrative PII Processing

The conventional approach would be to build an authenticated admin dashboard backed by a database:

code

Browser → HTTPS → Load Balancer → API Server → Database (with PII)

Every component in that chain is an attack surface. The API server needs authentication — which can be bypassed. The database needs credentials — which can leak. The load balancer needs configuration — which can be misconfigured. The HTTPS termination needs certificates — which can expire. The API responses need to be scoped — which can be over-permissive.

The local approach:

code

Local filesystem → Local script → Local output

One component. No network. No authentication to bypass. No credentials to leak. No API to probe. No database to dump.

"Every exposed API becomes part of your attack surface."

When you process PII locally, you have zero network attack surface for that processing. A remote attacker cannot intercept, probe, or exfiltrate data from a process that never touches the network.

5. Minimal Attack Surface: Complexity as the Enemy

The most underappreciated security principle is this: every component you add to a system is a component that can fail, be misconfigured, or be exploited.

The Attack Surface Comparison

Component	Typical CRUD App	SSoC Architecture
Web Server	Express/Nginx with routes	Static file server (CDN)
Database	MongoDB/PostgreSQL	None
Authentication	JWT/Sessions/OAuth	None (public data)
API Layer	REST/GraphQL endpoints	None
Admin Panel	Authenticated web UI	Local-only (data-dependent)
Environment Variables	DB_URL, API_KEY, JWT_SECRET	None in production
Background Jobs	Queue workers, cron jobs	Offline scripts (manual)
File Upload Processing	Multipart handlers	None
Email Service	SMTP/SendGrid integration	Separate, not linked to platform
PII Storage	In production database	Local filesystem only

Each row where SSoC has "None" is an entire category of vulnerabilities that doesn't apply. Not because we've defended against them, but because the architecture doesn't create the conditions for them to exist.

The Dependency Argument

A React application built with Vite still has node_modules with hundreds of packages. Isn't that a supply chain risk?

Yes — at build time. But at runtime, the deployed output is static HTML, CSS, and JavaScript. There's no node_modules on the server. No require() calls that could be hijacked. No dynamic imports from npm. The supply chain risk is confined to the build step, which runs locally, not in production.

6. Why I Didn't Build a Typical CRUD App

Most web development tutorials teach this architecture:

javascript

// The tutorial approach
const express = require('express');
const mongoose = require('mongoose');

mongoose.connect(process.env.MONGODB_URI); // PII in the database

const UserSchema = new mongoose.Schema({
  name: String,
  email: String,      // PII
  phone: String,      // PII
  github: String,
  score: Number,
});

app.get('/api/users', async (req, res) => {
  const users = await User.find({});  // All PII exposed via API
  res.json(users);
});

app.get('/api/users/:id', async (req, res) => {
  const user = await User.findById(req.params.id);  // Individual PII exposed
  res.json(user);
});

This is the default architecture that most developers reach for. It works. It's well-documented. It's what bootcamps teach. And for many applications, it's appropriate.

But for a programme handling 50,000 contributor records, this architecture means:

Every contributor's PII is one misconfigured query away from exposure — forget to add .select('-email -phone') to one route and you've leaked everything
The database connection string is a single point of compromise — one leaked environment variable and the entire dataset is accessible
Every API endpoint is a probe target — attackers can enumerate /api/users/1, /api/users/2, etc.
The admin panel shares infrastructure with the public site — a vulnerability in one can compromise the other

I'm not criticising beginners who build this way. I'm saying that when you're responsible for 50,000 people's personal data, you should ask: does this data need to be in a production database at all?

For SSoC, the answer was no.

Leaderboard scores can be pre-computed. Project metadata is public. Certificate data can be generated offline. The only operations that genuinely need PII — registration processing, onboarding communications, certificate personalisation — don't need a production database. They need a local spreadsheet and some scripts.

"The best way to protect sensitive data is to keep it out of places it never needed to be."

7. FinTech Lessons: A Decade of Building for Regulation

My software engineering philosophy has been shaped by more than a decade building FinTech systems in London. In financial services, security and privacy aren't features you add — they're constraints you design within. GDPR isn't a checklist; it's an engineering mindset.

Several principles from that experience directly influenced SSoC's architecture:

Privacy by Design

GDPR Article 25 requires that data protection is integrated into processing activities and business practices, from the design stage. For SSoC, this meant deciding before writing any code that PII would not enter the public deployment pipeline.

Least Privilege

In FinTech, every system component receives the minimum access required for its function. The scoring engine doesn't need email addresses to calculate PR scores, so it doesn't receive them. The public website doesn't need phone numbers to display leaderboards, so it doesn't have them.

Data Minimisation

GDPR Article 5(1)(c): personal data shall be adequate, relevant, and limited to what is necessary. The scoring engine's input is a list of GitHub usernames. Not a full registration export. Not a database view. A flat array of strings.

Separation of Duties

In financial systems, the person who initiates a transaction shouldn't be the same person who approves it. In SSoC, the system that serves public content is architecturally separate from the system that processes PII. They don't share databases, APIs, servers, or deployment pipelines.

Defence in Depth

No single control is sufficient. SSoC's security doesn't rely on "the database is password-protected" or "the admin panel requires authentication." It relies on multiple layers:

PII is absent from the public deployment (architectural isolation)
Administrative data files are gitignored (source control isolation)
Admin interfaces depend on local data (functional isolation)
Processing happens locally (network isolation)
Static architecture eliminates entire vulnerability classes (attack surface reduction)

Secure Defaults

The default state of the SSoC public platform — freshly deployed, no configuration — contains zero PII. You don't have to remember to enable encryption, set up access controls, or configure firewall rules. The default is secure because there's nothing sensitive to protect.

8. Threat Modelling: What Could Still Go Wrong

I promised honesty. Here it is.

What This Architecture Protects Against

Remote database dumps — there's no database to dump
API enumeration of PII — there's no API serving PII
Admin panel compromise — the admin panel has no data in production
Credential leaks exposing PII — there are no database credentials in production
Supply chain attacks at runtime — there's no server-side code execution in production

What This Architecture Does NOT Protect Against

No architecture is "hack-proof." Anyone who claims otherwise is selling something. Here are the real residual risks — and what we do about each one.

1. GitHub Account Compromise

If an attacker gains access to the GitHub account that owns the repository, they can push a modified user-scores.js containing malicious JavaScript. Every visitor's browser executes it. No database needed — the static site itself becomes the attack vector.

Mitigation: 2FA on all GitHub accounts, branch protection rules, signed commits, code review for all changes to data files.

2. npm Supply Chain (Build-Time)

The build process runs npm install and vite build. If a dependency is compromised — and this has happened before (event-stream, ua-parser-js, colors) — the built output could contain injected code. It ships as static files, but those static files execute in 50,000 browsers.

Mitigation: Lock file (package-lock.json) pins exact versions, npm audit before builds, minimal dependency footprint, local builds (not CI/CD that could be tampered with).

3. Google Account Compromise

The Master TSV originates from Google Sheets. 2FA bypass, session hijacking, OAuth token theft — if someone gets into that Google account, they have every email address and phone number ever submitted. The platform architecture protects the deployment, not the data source.

Mitigation: Google Advanced Protection, hardware security keys, limited sharing (single owner), regular access review.

4. Local Machine Compromise

Disk encryption helps, but if the local machine is compromised — malware, physical access, remote exploit — the Master TSV is right there on the filesystem. No amount of network separation helps when the attacker is already on the machine where the data lives.

Mitigation: Full-disk encryption, OS-level security updates, endpoint protection, physical security, minimal data retention (delete old exports).

5. Bonus Points Repository Tampering

The scoring engine fetches bonus points from a separate GitHub repository. If that repository is compromised, an attacker could award themselves thousands of points or penalise other contributors. The scoring engine trusts this input completely.

Mitigation: Repository access limited to programme administrators, branch protection, commit history auditing, bonus point totals reviewed during each scoring run.

6. DNS/CDN Hijacking

If an attacker compromises DNS records or the GitHub Pages serving infrastructure, they can serve a modified version of the site to all visitors. The static files in the repository are genuine, but what users' browsers actually receive could be different.

Mitigation: DNSSEC where supported, GitHub Pages' own infrastructure security, HTTPS enforcement, Subresource Integrity (SRI) for critical scripts.

7. Social Engineering and Insider Threat

Someone could impersonate a legitimate need for registration data, or a programme administrator with legitimate access could misuse it.

Mitigation: Strict need-to-know policy, minimal number of people with access to raw registration data, audit trails for data exports.

mermaid

graph TB
    subgraph "Eliminated by Architecture (Green)"
        A1[Database Dump]
        A2[API Enumeration]
        A3[Admin Panel Exploit]
        A4[Credential Leak]
        A5[Runtime Supply Chain]
    end

    subgraph "Mitigated by Operational Security (Amber)"
        B1[GitHub Account Compromise]
        B2[npm Supply Chain - Build Time]
        B3[Google Account Compromise]
        B4[Local Machine Compromise]
        B5[DNS/CDN Hijacking]
        B6[Bonus Repo Tampering]
    end

    subgraph "Residual Risk (Red)"
        C1[Social Engineering]
        C2[Insider Threat]
        C3[Physical Access]
    end

    style A1 fill:#2ecc71,stroke:#27ae60,color:#fff
    style A2 fill:#2ecc71,stroke:#27ae60,color:#fff
    style A3 fill:#2ecc71,stroke:#27ae60,color:#fff
    style A4 fill:#2ecc71,stroke:#27ae60,color:#fff
    style A5 fill:#2ecc71,stroke:#27ae60,color:#fff
    style B1 fill:#f39c12,stroke:#e67e22,color:#fff
    style B2 fill:#f39c12,stroke:#e67e22,color:#fff
    style B3 fill:#f39c12,stroke:#e67e22,color:#fff
    style B4 fill:#f39c12,stroke:#e67e22,color:#fff
    style B5 fill:#f39c12,stroke:#e67e22,color:#fff
    style B6 fill:#f39c12,stroke:#e67e22,color:#fff
    style C1 fill:#e74c3c,stroke:#c0392b,color:#fff
    style C2 fill:#e74c3c,stroke:#c0392b,color:#fff
    style C3 fill:#e74c3c,stroke:#c0392b,color:#fff

"Architecture is your first security control."

But it's not your only one.

9. The Data Flow: End to End

Let's trace the complete lifecycle of a contributor's data through SSoC:

Registration

code

Contributor fills Google Form
  → Response stored in Google Sheet (PII: name, email, phone, GitHub, LinkedIn)
  → Sheet exported as TSV to local machine
  → TSV is gitignored, never committed

Scoring

code

Local script extracts GitHub usernames from TSV
  → Generates users.json (usernames only, no PII)
  → users.json committed to scoring engine repo
  → Scoring engine queries GitHub API for public PR data
  → Scores computed, static JS files generated
  → Static files deployed to public website

Administration

code

Ground Control loads TSV from local filesystem
  → Validates names, emails, phone numbers
  → Cross-references with scoring data
  → All processing happens in-browser, locally
  → No data sent to any server

Certification

code

Certificate generation script reads TSV locally
  → Generates certificate data with names and IDs
  → Certificate verification uses name + ID (minimal PII)
  → Deployed to public site for verification

At no point in this flow does the full registration dataset — with email addresses and phone numbers — enter a system accessible from the public internet.

10. Engineering Trade-Offs

This architecture is not free. There are real costs:

What We Give Up

Real-time leaderboards — scores update when the scoring engine runs, not continuously. There's a delay between a PR being merged and scores updating.
Self-service profile editing — contributors can't update their own information through the platform because there's no database to update.
Automated notifications — the platform can't send emails or push notifications because it has no backend.
Multi-admin access — Ground Control works on one machine with one dataset. There's no shared admin dashboard.
Scalable operations — everything that touches PII requires manual steps. At 50,000 contributors, this means careful batch processing.

What We Gain

Zero PII exposure through the public platform — the most critical win
Minimal operational security burden — no database credentials to rotate, no API keys to manage, no sessions to expire
Audit simplicity — the entire public deployment can be inspected as static files
Deployment simplicity — push static files to GitHub Pages, done
Resilience — no database to crash, no API to overload, no server to go down

The trade-offs are real, but for a seasonal programme with batch-oriented workflows, they're acceptable. A real-time trading platform couldn't work this way. A seasonal open-source programme can.

11. Lessons for Developers

If you're building a platform that handles user data, here's what I'd encourage you to think about:

Before You `npm install express mongoose`

Ask: does this data need to be in a production database?

If you need real-time reads and writes: yes, use a database
If you need user authentication: yes, you need a backend
If you can pre-compute and serve static results: maybe you don't

The Pre-Computation Question

Many applications that look dynamic are actually batch-processable:

Leaderboards that update hourly → pre-compute and deploy
Dashboards with daily metrics → generate static JSON overnight
Profile pages with relatively stable data → generate at build time
Documentation sites → static generation (this is already mainstream)

The PII Proximity Principle

The further PII is from your public infrastructure, the smaller your blast radius if something goes wrong.

code

PII in production database       → one query away from exposure
PII in separate internal service → one network hop away
PII on local machine only        → air-gapped from public internet
PII never collected              → zero risk (but rarely practical)

Move left on this spectrum wherever you can.

Complexity Budget

Every component has a security cost. Budget for it:

code

Static file server:        Low complexity, low risk
API with public endpoints: Medium complexity, medium risk
Database with PII:         High complexity, high risk
Admin panel on same infra: Multiplied risk (shared blast radius)

If you can achieve your goals with fewer components, you should.

12. The /security Page: Making Transparency a Feature

Writing this blog post made something clear: the security architecture shouldn't just be documented externally — it should be visible to every contributor directly on the platform.

So we built a dedicated /security page. It lives at /security and is linked from the Tools page.

What It Covers

The page is split into two halves:

"How We Protect Your Data" — the platform's side:

Three overview cards: No Production Database, No Backend Server, PII Never Deployed
A data collection table showing exactly what we collect, why, and whether it's ever public — with green "Never public" badges for email, phone, and Discord
Expandable accordion sections explaining the static architecture, PII separation, local-first administration, and exactly what the public website can and cannot see
An attack vector comparison table showing 7 common web app vulnerabilities and why they don't apply to our architecture

"Protect Yourself" — the participant's side:

GitHub account security — enabling 2FA, reviewing OAuth apps, hiding your email from commits
Google account security — 2FA, Security Checkup, revoking unused app permissions
Discord security — 2FA, Nitro scam awareness, QR code warnings, DM caution
Passwords & authentication — password managers, haveibeenpwned.com, authenticator apps over SMS
Recognising phishing — a clear statement that SSoC will never ask for your password, and how to verify suspicious messages
Protecting your code — never committing secrets, using git diff before pushing, rotating leaked credentials immediately
Device security — full-disk encryption, OS updates, public Wi-Fi caution

Why a Security Page Matters

Most open-source programmes don't have a security page. Most probably don't need one. But when you're handling 50,000 contributors' personal data and the community is asking questions about data safety, transparency isn't optional — it's infrastructure.

The page ends with an honest footer: "No system is 100% secure — but architecture determines how much risk exists."

That's the same message as this blog post, distilled into a single sentence on a page that every contributor can find.

Key Engineering Takeaways

Architecture is your first security control. Encryption, authentication, and firewalls matter — but they defend data that's already in the danger zone. Architecture determines whether it gets there at all.
Static-first isn't just a performance strategy. It eliminates entire classes of vulnerabilities: injection, authentication bypass, session hijacking, API enumeration. The attack surface you don't create is the one that can never be exploited.
PII separation is non-negotiable at scale. When you're responsible for 50,000 people's personal data, the question isn't "how do we secure the database?" It's "does this data need to be in a database at all?"
Local processing is underrated. Not everything needs to be a cloud service. Administrative tasks that touch PII can often be performed locally, eliminating network attack surface entirely.
Honest threat modelling beats false confidence. No system is "hack-proof." The goal is to reduce attack surface, minimise blast radius, and be transparent about residual risks.
Trade-offs are real and worth making. We give up real-time updates and self-service features in exchange for a dramatically reduced security burden. For a seasonal programme, that trade-off is correct.

Conclusion

The question contributors ask — "How do you protect our data?" — has a simple answer: we keep it away from the public internet.

Not behind a firewall. Not encrypted in a database. Not protected by authentication. Away. On a local machine, processed by local scripts, never deployed to any public server.

This isn't the right architecture for every application. Real-time collaborative tools need databases. E-commerce platforms need payment processing. Social networks need user-generated content storage. But for a seasonal open-source programme that computes leaderboards, generates certificates, and displays project metadata — a static-first, PII-separated, local-admin architecture eliminates more risk than any amount of runtime security controls could provide.

"The strongest security often comes from reducing complexity and limiting what can be reached in the first place."

The next time you're designing a system that handles user data, before you spin up a database and wire up API endpoints, ask yourself: does this data actually need to be here?

You might find that the safest architecture is the one that keeps sensitive data out of the blast zone entirely.

Praveen Kumar Purushothaman is VP & Director of Engineering at Social Summer of Code (SSoC). He has spent over a decade building FinTech systems in London, where GDPR, privacy-by-design, and minimising attack surfaces are fundamental engineering principles. SSoC 2026 serves 50,000+ contributors across hundreds of open-source projects.

The Threat Model

Architecture Overview

1. The Public Platform: A Database-Free Architecture

How Leaderboards Work Without a Database

What About Dynamic Features?

2. PII Separation: The Data That Never Deploys

Data Minimisation in Practice

3. Ground Control: The Admin Interface That Can't Be Replicated

Why Not Remove the Route Entirely?

4. Local-First Administration

Why Local Beats Cloud for Administrative PII Processing

5. Minimal Attack Surface: Complexity as the Enemy

The Attack Surface Comparison

The Dependency Argument

6. Why I Didn't Build a Typical CRUD App

7. FinTech Lessons: A Decade of Building for Regulation

Privacy by Design

Least Privilege

Data Minimisation

Separation of Duties

Defence in Depth

Secure Defaults

8. Threat Modelling: What Could Still Go Wrong

What This Architecture Protects Against

What This Architecture Does NOT Protect Against

1. GitHub Account Compromise

2. npm Supply Chain (Build-Time)

3. Google Account Compromise

4. Local Machine Compromise

5. Bonus Points Repository Tampering

6. DNS/CDN Hijacking

7. Social Engineering and Insider Threat

9. The Data Flow: End to End

Registration

Scoring

Administration

Certification

10. Engineering Trade-Offs

What We Give Up

What We Gain

11. Lessons for Developers

Before You npm install express mongoose

The Pre-Computation Question

The PII Proximity Principle

Complexity Budget

12. The /security Page: Making Transparency a Feature

What It Covers

Why a Security Page Matters

Key Engineering Takeaways

Conclusion

You might also enjoy

Before You `npm install express mongoose`