NASDAQ: CG
Harnoor Minhas
Carlyle Β· Agentic AI Dashboard Β· made by Harnoor Minhas Β· May 2026

Agentic AI for Carlyle, grounded on data your teams can trust.

One trusted source of company data on Snowflake β€” so people can ask in plain English and AI agents can plan, retrieve, and act reliably, with every answer governed and cited.

70% less manual processing
95% extraction accuracy
Β½ PB Snowflake in production
THE IDEA, IN ONE PICTURE
Messy data from many systems
↓
One clean, trusted record (MDM)
↓
People & AI get reliable answers
β“˜A reference demo built for this role by Harnoor Minhas. The data is illustrative β€” it's here to show how I think, not a live system.
Carlyle (NASDAQ: CG) β€” price history
Real daily closes Β· last on
What this actually solves
🎯
A single source of truth

One authoritative record for every team and AI system β€” no competing definitions.

πŸ’¬
Answers in plain language

Ask like a sentence, get a chart β€” no analyst queue.

πŸ›‘οΈ
Governed, auditable AI

Every answer grounded, cited and logged β€” fit for a regulated firm.

Demonstration

A brief demonstration

Five short, interactive examples. Each is introduced in one plain-language sentence before any detail β€” click through them.

Example 1
Ask a question in plain language.

In plain language: a manager types a normal sentence; the system writes the database query, returns a chart, and reads the answer aloud. Tap a question to ask it.

Example 2
Three conflicting records β€” one trusted record.

In plain language: the same portfolio company β€” a financial-services firm β€” appears three different ways across three systems. The platform reconciles them into one authoritative record. (Technically, this is Master Data Management.)

Bloomberg
Atlas Capital Holdings, Inc.
LEI 5493001KJTIIGC8Y1R12
S&P Capital IQ
Atlas Capital Hldgs
CIK 0001702010
Internal CRM
Atlas Capital Holdings LLC
ID PC-00342

Same firm β€” three names, three identifiers. Which one is canonical?

βœ“ ONE TRUSTED GOLDEN RECORD
Atlas Capital Holdings, Inc.
Legal name Atlas Capital Holdings, Inc. S&P Β· authoritative
LEI 5493001KJTIIGC8Y1R12 Bloomberg
CIK 0001702010 S&P
Internal ID PC-00342 CRM
Each field keeps the most authoritative source β€” fully traceable, ready for AI to use.
Example 3
Ask a document a question β€” and get a cited answer.

In plain language: instead of guessing, the AI looks up the most relevant passages in a document and answers using only those β€” then shows exactly which passage it used. (The tech: retrieval-augmented generation, or RAG β€” in production, Snowflake Cortex Search.)

Example 4
Streaming vs. yesterday's batch β€” why real-time matters.

In plain language: the old way refreshes data once a night, so by midday it's stale. A streaming pipeline keeps the number live to the second. Watch the two diverge. (The tech: Apache Kafka β†’ Snowpipe Streaming, vs a nightly batch load.)

Portfolio events→ Apache Kafka→ Snowpipe Streaming→ Live Dynamic Table
STREAMING Β· live
deployed capital run-rate Β· updates every second
NIGHTLY BATCH Β· last refreshed 02:00
$4,120M
same metric, frozen at last night's snapshot

By midday the batch number is already behind reality. Streaming closes that gap.

Example 5
When the job is heavy ML β€” reach for the lakehouse.

In plain language: most questions are answered inside Snowflake. But training large models on years of history is a different job β€” that's where a Databricks lakehouse and MLflow earn their place, reading the same governed golden records over open Iceberg tables. One source, two engines.

Shared foundation

Iceberg golden records β€” no copy, no drift.

Snowflake / Cortex

Serving, NL→SQL, RAG, agents — governed.

Databricks + MLflow

Heavy training, feature store, experiment tracking.

The discipline: pick the engine the workload needs β€” without ever forking the source of truth.

The approach

How it works β€” five steps

1
Collect

Pull data continuously from every system β€” Bloomberg, S&P, CRM, filings. (streaming + connectors)

2
Unify

Match duplicates and build one trusted "golden" record per company. (MDM)

3
Understand

Define metrics once so language maps to data the same way every time. (semantic layer)

4
Serve

Let people and AI agents ask, retrieve, and act on it. (RAG, agents, copilots)

5
Govern

Wrap everything in access controls, citations, logging, and quality checks β€” so every answer is safe and auditable. (governance & eval)

The engineering, in detail

Technical architecture & implementation

For technical reviewers β€” the underlying architecture, Snowflake SQL, design trade-offs, and a requirement-by-requirement mapping to the job description. Each section expands on request.

Palantir Foundry ontology β€” how it fits show / hide

In plain language: an ontology turns rows in tables into the business objects Carlyle actually reasons about β€” a Fund, a Portfolio Company, a Deal β€” with the relationships between them and the governed actions you can take. It is the shared language analysts, applications, and AI agents all use. Foundry and the Snowflake foundation are complementary: the same golden records back both β€” one canonical source, two consumption planes.

Limited Partner commits to β†’ Fund invests in β†’ Portfolio Company has β†’ Valuation

Every object is backed by the same Snowflake golden records β€” one canonical source feeding both Foundry and Cortex.

Object types
FundPortfolio CompanyDealLimited PartnerValuationSector
Governed actions (with approval)
  • β†’ Record a new portfolio-company valuation
  • β†’ Advance a deal stage (Sourced β†’ Diligence β†’ Close)
  • β†’ Flag a concentration or covenant risk
  • β†’ Generate an LP reporting pack for a fund

Why it matters: agents and analysts operate on objects and actions β€” not raw tables β€” so AI work stays grounded, permissioned, and auditable. My Foundry experience (ontology design + integration on a healthcare supply-chain platform) maps directly to portfolio-operations objects here.

Reference architecture show / hide
GOVERNANCE
Wraps every layer
RBAC Β· masking
lineage Β· eval
audit logging
Sources β€” Bloomberg Β· S&P Β· CRM Β· SEC EDGAR Β· PDFs
↓
Ingest β€” Snowpipe Streaming + Fivetran
↓
Bronze β€” Iceberg raw tables
↓
MDM β€” the foundation β€” matching + survivorship β†’ golden records
↓
Silver β€” canonical Dynamic Tables
↓
Semantic layer β€” Cortex Analyst model (one contract for BI + AI)
↓
Cortex AI — Search (RAG) · Analyst (NL→SQL) · Agents · AI SQL
↓
Consumers β€” analysts + AI agents / copilots

Read top to bottom: data flows from sources to consumers. Governance wraps every layer (left). MDM is the foundation — golden records before AI; everything Snowflake-native so RAG, NL→SQL, and agents stay inside the governance boundary.

🧬

How the matching works β€” in plain English

The three steps behind Examples 2 and 3, explained simply. The actual Snowflake code sits under each step for engineers β€” you don't need to read it to get the idea.

1Find the likely matches, then score them

Rather than compare every record to every other one (far too slow), the system first groups records that could be the same firm, then scores how alike their names are. A high score means it's the same company.

-- SOUNDEX blocking avoids an NΒ² comparison; score with Jaro-Winkler
WITH pairs AS ( … )   -- group candidates + score name similarity
-- SOUNDEX blocking avoids an NΒ² comparison; score with Jaro-Winkler (0–100)
WITH pairs AS (
  SELECT a.raw_id AS id_a, b.raw_id AS id_b, a.cik AS cik_a, b.cik AS cik_b,
         JAROWINKLER_SIMILARITY(LOWER(a.name), LOWER(b.name)) AS name_sim,
         EDITDISTANCE(a.domain, b.domain)                     AS domain_dist
  FROM   bronze.raw_records a
  JOIN   bronze.raw_records b
    ON   a.raw_id < b.raw_id
   AND   SOUNDEX(a.name) = SOUNDEX(b.name)        -- blocking key
)
SELECT id_a, id_b,
       (0.7 * name_sim/100.0) + (0.3 * (domain_dist = 0)::INT) AS match_score
FROM   pairs
QUALIFY match_score > 0.85 OR cik_a = cik_b;       -- deterministic override
2Build one trusted record β€” best source wins each field

When the matched records disagree, the most authoritative source wins for each field (for example, S&P for the legal name). The result is a single “golden” record β€” the best of all three.

-- most-authoritative source wins per field
SELECT … QUALIFY ROW_NUMBER() OVER ( … ) = 1;
-- source_priority: S&P=1, Bloomberg=2, CRM=3 β†’ lowest wins per field
SELECT match_group, name, cik, figi, internal_id, source_system
FROM   silver.matched_records
QUALIFY ROW_NUMBER() OVER (
          PARTITION BY match_group
          ORDER BY source_priority, updated_at DESC) = 1;
3Let AI search the documents β€” and cite its sources

This sets up the search that finds the right passages inside filings and reports, so the AI answers from real source documents and shows exactly where each fact came from.

CREATE OR REPLACE CORTEX SEARCH SERVICE portfolio_docs
  ON content_chunk … EMBEDDING_MODEL = 'snowflake-arctic-embed-l-v2.0';
CREATE OR REPLACE CORTEX SEARCH SERVICE portfolio_docs
  ON          content_chunk
  ATTRIBUTES  golden_id, doc_type, page_number
  WAREHOUSE   = search_wh
  TARGET_LAG  = '1 hour'
  EMBEDDING_MODEL = 'snowflake-arctic-embed-l-v2.0'
AS (SELECT content_chunk, golden_id, doc_type, page_number
    FROM silver.doc_chunks);

-- retrieve grounded, cited passages for an agent or copilot
SELECT value:content_chunk::STRING AS passage,
       value:golden_id::STRING     AS company,
       value:page_number::INT      AS page
FROM TABLE(FLATTEN(PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW('portfolio_docs', '{
    "query":   "defense and government exposure",
    "columns": ["content_chunk","golden_id","page_number"],
    "limit":   3 }'))));
βš–οΈ

Native vs. external β€” when I'd reach outside Snowflake

Default to Snowflake-native so AI stays inside the governance boundary. Reach out only with a clear reason β€” each external hop is a governance seam to defend.

Retrieval / RAG
Native βœ“ default
Cortex Search β€” governed, cited, zero data movement.
Reach out when
>100M vectors at sub-10ms β†’ Pinecone / pgvector.
Heavy ML training
Native βœ“ default
Snowpark ML for in-platform models.
Reach out when
Large-scale training β†’ Databricks + MLflow over shared Iceberg.
Frontier / multi-model
Native βœ“ default
Cortex serves Claude & GPT inside the boundary.
Reach out when
Provider routing, cost caps & failover β†’ thin AI gateway.

The principle: if a governed view and a defined metric answer the question, that beats an agent. AI-forward by default β€” never AI for its own sake.

Track record

Delivered in production β€” at Fortune 500 scale

19+ years across financial services and federal regulated systems. Sanitized for confidentiality β€” the patterns are exactly what this role needs.

70% / 95%

Cut manual document processing 70% at 95% accuracy with AI on Snowflake, in regulated financial data.

Β½ PB

Built & governed a half-petabyte Snowflake platform; canonical models adopted across Finance, Marketing & Operations.

$500K+

Annual savings from smarter warehouse sizing and cost controls β€” at full production scale.

Selected migrations & platforms
FINANCIAL SERVICES Β· Β½ PB
Legacy warehouse β†’ Snowflake

Led a multi-source migration to a half-petabyte Snowflake lakehouse with Kimball-modeled marts and a shared semantic layer.

REGULATED Β· MASTER DATA
MDM & golden records

Built fuzzy-matching + survivorship pipelines producing one trusted record per entity across competing source systems.

AI ON GOVERNED DATA
RAG & document intelligence

Production retrieval-augmented generation over governed documents β€” cited, logged, and quality-checked for a regulated firm.

Signals of depth
Fortune 500 migrations Petabyte-scale Snowflake DW 4Γ— AWS Certified AWS Authorized Instructor Kimball dimensional modeling Master Data Management RAG Β· Cortex Β· agents Palantir Foundry Financial services + federal
Who built this

Harnoor Minhas β€” Senior Agentic AI & Hands-on Data Architect

Senior AI & Data Architect β€” 19+ years, half-petabyte Snowflake in production with Cortex AI, master & reference data, RAG, and Palantir Foundry, across financial services and federal regulated systems. Reston, VA Β· the 4-day DC cadence works well.

Full rΓ©sumΓ© / portfolio β†’ πŸ“„ Hiring-manager brief βš™οΈ Technical detail Download rΓ©sumΓ© LinkedIn

Made by Harnoor Minhas Β· May 2026 Β· reference-architecture demo Β· illustrative data Β· not a live production system Β· Snowflake feature names & SQL accurate to current docs Β· Carlyle stats from public disclosures (NASDAQ: CG).