SSoC 2026 has 50,000 registered participants, 300+ projects, and 5,000+ mentors. The public-facing site is what everyone sees, the leaderboard, the project cards, the contributor profiles. But the tool that actually keeps the programme running lives at a route that is not in the navbar, not in the footer, and not linked from anywhere. You get to it by navigating to a hidden /local-prav-tools and clicking “Ground Control”.


This is the story of building internal tooling for a programme about open source, and why the most impactful code we wrote this season is the code nobody will ever see.
The Problem: Death by Discord DM
Every open-source programme at scale has the same failure mode. A contributor opens a Discord thread:
“Hey, I submitted a PR two days ago but my score isn’t showing on the leaderboard.”
Sounds simple. It is not. To debug this, you need to check three separate data sources:
- Registration data, Did they register? Is their GitHub username spelled correctly? Did they paste the full URL instead of just the username?
- Scoring engine output, Did the scoring engine pick up their PR? Is there a case mismatch between
UserNameandusername? - Project registry, Is the repo they contributed to actually a registered SSoC project?
Each of these lives in a different place. Registration data is a Google Form that exports to a TSV. Scoring engine output is a set of JSON files generated by a Node script that runs on someone’s laptop. The project registry is a JSON config file in the site repo.
Before Ground Control, resolving a single “where’s my score?” question took 15-30 minutes of manual cross-referencing. With 50,000 participants, even a 1% confusion rate means 500 support threads. The math does not work.
Ground Control collapses that 15-30 minutes to about 10 seconds.
Architecture: Deliberately Unsophisticated
Ground Control has no backend. No database. No API. The entire system runs on two data sources:
- A TSV file exported from Google Forms, dropped into
/LocalData/MasterSheetsData.tsv - A set of JSON globals (
window.prs,window.userScores,window.paMetrics) injected via script tags from the scoring engine
That is it. The React component fetches the TSV over a local HTTP request, parses it in the browser, and cross-references it with whatever scoring data happens to be loaded in the page.
This is a deliberate architectural choice, not a shortcut. The registration data contains phone numbers, email addresses, and LinkedIn profiles. Running a backend means that data traverses a network. With Ground Control running purely client-side against a local file, registration data never leaves the admin’s machine. Privacy by architecture, not by policy.
The TSV Parser Nobody Wanted to Write
Google Forms exports TSV. TSV is simple until it is not. The “Project Description” field is a free-text area, which means participants paste multi-line content, which means the TSV has quoted fields that span multiple lines.
The parser handles this by walking through lines and tracking quote parity:
const logicalRows: string[] = [];
let current = "";
let inQuote = false;
for (const line of lines) {
if (!inQuote) {
current = line;
} else {
current += "\n" + line;
}
const quoteCount = (line.match(/"/g) || []).length;
if (quoteCount % 2 === 1) {
inQuote = !inQuote;
}
if (!inQuote) {
logicalRows.push(current);
current = "";
}
}
Is this a production-grade TSV parser? No. Does it handle every edge case in RFC 4180? Definitely not. But it handles the specific shape of data that Google Forms actually produces, which is the only shape we care about. Internal tools earn the right to be narrow.
The Registrations Tab: Validate Everything
The first thing Ground Control does when it loads the TSV is run every single row through a validation pipeline. Not because we are pedantic about data quality, but because dirty registration data causes real downstream failures.
Here is what it checks:
- Name casing: Is “john doe” actually “John Doe”? The
toTitleCasefunction is naive but effective. - GitHub URL cleanup: People paste
https://github.com/username/,github.com/username,@username, or justusername. TheextractGitHubUsernamefunction normalises all of these down to the bare username. If the stored value differs from the clean version, that is an issue. - LinkedIn URL cleanup: Strip query parameters and fragments. LinkedIn URLs with
?utm_source=...trailing params break nothing but look sloppy and signal data quality problems. - Phone format: Must be international format (
+91...). India-centric, yes, but that is where 90%+ of our participants are. - Email validation: Basic regex. Catches the obvious typos.
Deduplication Logic
People submit the registration form multiple times. A naive dedup by email would break Project Admins, who legitimately submit once per project. So the dedup key is context-sensitive:
const isPA = row.role.trim() === "Project Admin";
const key = isPA && repo ? `${email}::${repo}` : email;
For Project Admins: email + repoURL. For everyone else: just email. Within each group, we keep the latest submission (sorted by timestamp) and mark the rest as duplicates.
The Stats Dashboard
After processing, Ground Control renders eight stat cards at a glance: Unique Registrations, Contributors, Project Admins, Mentors, Other Roles, Unique Projects, Duplicates, and Records With Issues.
Below that, collapsible diagnostics panels show:
- Missing Fields breakdown, which fields are most commonly left blank
- Issue Breakdown, how many records fail each validation rule
- Registration Activity, last 24 hours, last 7 days, peak registration day
- Top Email Domains, a bar chart showing gmail.com vs. university domains vs. everything else
- Top Tech Stacks, what people are actually building with
- Mentor Coverage, projects with and without assigned mentors
The search bar does full-text matching across all fields. Role filter dropdown, “Issues Only” toggle, “Show Dupes” toggle, “Multi-Project PAs” filter. Everything you need to slice the data without opening a spreadsheet.
The Share Issues Button
When you find records with problems, you need to tell people to fix their data. Ground Control generates formatted Discord markdown:
**Data Issues Report**
Found **47** record(s) with issues.
> **john doe** (Contributor)
> - Name: Should be “John Doe”
> - GitHub: Full URL given, should be just “johndoe”
One click to copy. Paste into Discord. Done.
The PRs Tab: Cross-Referencing the Scoring Engine
The second tab is newer and solves a different problem. The leaderboard scoring engine produces JSON output that tells us which PRs were scored, at what level, and for how many points:
- Easy: 20 pts, Medium: 30 pts, Hard: 40 pts
- Beginner: 50 pts, Intermediate: 150 pts, Advanced: 200 pts
The PRs tab takes a Discord ID or GitHub username as input and immediately shows you the full picture: registration status, scored PRs, total score, and, critically, what went wrong if something did.
The diagnostics panel uses color-coded cards to surface problems instantly:
- Green (Registered): Everything checks out. Shows name, role, GitHub, Discord ID.
- Amber (Case Mismatch): Registered as
UserName, scoring engine hasusername. GitHub usernames are case-insensitive, but string matching is not. - Amber (No PRs Scored): Registered but zero scored PRs. Might be new, might be a problem.
- Red (Not Registered): PRs exist in the scoring data but the user is not in the registration TSV.
- Amber (Unregistered Project PRs): PRs submitted to repos that are not in the project registry.
The lookup maps are built with useMemo for performance, case-insensitive Maps from Discord ID to registration row, and from GitHub username to registration row. No new data is fetched. The PRs tab receives the processed registration data as props from the parent component and reads scoring data from window.prs.prs and window.userScores.
const discordToRow = useMemo(() => {
const map = new Map<string, ProcessedRow>();
for (const p of processed) {
const did = p.row.discordId?.trim().toLowerCase();
if (did) map.set(did, p);
}
return map;
}, [processed]);
The PR table is sortable by every column: PR number, project, title, level, score, merged date. A summary row above the table shows total PRs, total score, and a level breakdown with color-coded badges. A “Show All” mode dumps every scored PR with cross-column text filtering.
The Tech Stack
React, TypeScript, Tailwind CSS, Lucide icons. Dark theme (slate-950 background). No routing within Ground Control itself, just a tab switcher between Registrations and PRs. The parent page loads the TSV, processes it, and passes the ProcessedRow[] array down as props. The PRs tab is a separate component (GroundControlPRs.tsx) that receives that array and does its own cross-referencing.
Total footprint: two files, roughly 1,200 lines combined. No external dependencies beyond what the main site already uses.
Honest Trade-Offs
This system has real limitations and I am not going to pretend otherwise.
The TSV is a manual export. Someone downloads it from Google Sheets and drops it into the LocalData directory. There is no sync, no webhook, no automation. If the TSV is stale, Ground Control shows stale data. We could automate this with the Google Sheets API, but the manual step takes 30 seconds and happens once a day. The complexity of OAuth token management and API quota handling is not worth it for a seasonal programme.
The scoring engine runs on a laptop. The JSON globals that power the PRs tab are generated by a Node script that someone runs locally. The output gets committed to the repo. This is fine because the scoring engine needs manual oversight anyway, you want a human reviewing the PR labels and scores before they go live.
The TSV parser is fragile. It handles the common cases but would break on pathological inputs. We do not care. The input shape is controlled by a single Google Form. If the form changes, we update the parser. This is a feature, not a bug, the parser is simple enough that updating it takes minutes.
There is no auth. Ground Control is hidden behind an unlisted route, not protected by authentication. Security through obscurity is not security, but the data it accesses is either already public (GitHub usernames, project names) or only available locally (the TSV file on the admin’s machine). The threat model is “someone finds the route”, and the consequence is “they see an empty dashboard because they do not have the TSV file”.
The Meta-Observation
There is an irony in building admin tools for a programme dedicated to open source. SSoC exists to get people excited about contributing to public repositories. The most impactful engineering work we did this season, the tool that saves the most time, prevents the most support overhead, and keeps the programme running smoothly, is a hidden page that no participant will ever see.
This is not unusual. At every company I have worked at, the internal tools are where engineering leverage is highest. A public-facing feature might delight users, but an internal tool that saves 20 minutes per support ticket, multiplied by hundreds of tickets, buys back weeks of human time.
Ground Control is not elegant. It parses TSV files in the browser. It reads globals off the window object. It has a route called /local-prav-tools. But it works, it was built in days not months, and it turned a 15-minute manual process into a 10-second lookup.
Sometimes the best engineering is the engineering that nobody sees.
