Skip to main content

DPCH05 — Self-Describing

“Rich Metadata Embedded”

What DPCH05 is really asserting

DPCH05 is not asserting that:

“Some metadata fields are filled in.”

It is asserting that:

A Data Product carries enough embedded, product-level information for a consumer (human or machine) to understand its meaning, scope, structure, lineage, and usage context without relying on external explanations.

Self-describing means the product explains itself.


The Essence (HDIP + Data Mesh Interpretation)

A Data Product is self-describing if and only if:

  1. Meaning is explicit, not implied
  2. Context travels with the product
  3. Understanding does not depend on people, systems, or tribal knowledge

If understanding requires:

  • asking the producer,
  • reading pipeline code,
  • reverse-engineering schemas,

then DPCH05 is not met, even if metadata exists somewhere.


Positive Criteria — When DPCH05 is met

DPCH05 is met when all of the following are true:

1. Business meaning is explicit and authoritative

The product clearly describes:

  • what business concept(s) it represents
  • how those concepts are defined
  • what is in scope and out of scope

Descriptions are written in business language, not system language.


2. Structure is explained, not just exposed

The product includes:

  • clear description of entities, events, measures
  • explanation of key fields (not just names)
  • units, time semantics, and aggregation meaning where relevant

A schema without explanation is not self-describing.


3. Lineage and provenance are visible

Consumers can see:

  • where the data comes from (at product level)
  • major upstream dependencies
  • transformation intent (at a conceptual level)

This supports trust without exposing technical pipelines.


4. Usage context and constraints are stated

The product declares:

  • intended use cases
  • known limitations
  • freshness expectations
  • quality posture (“best effort”, “regulated reporting”, etc.)

This allows consumers to self-assess fitness for purpose.


Negative Criteria — When DPCH05 is not met

DPCH05 is not met if any of the following are true:

❌ Metadata exists but is technical or opaque

Examples:

  • table names and column names only
  • system-oriented descriptions
  • abbreviations without definitions

This describes storage, not meaning.


❌ Meaning lives outside the product

Examples:

  • definitions only in Confluence
  • explanations only in slide decks
  • “ask the team” as documentation strategy

This creates dependency and does not scale.


❌ Lineage or quality context is absent

Examples:

  • no indication of sources
  • no freshness expectations
  • no explanation of update behavior

Consumers cannot assess trust or suitability.


Edge Cases (Important Guidance for Agents)

Case 1: “Schema documented, but business meaning unclear”

⚠️ Partial

Rationale:

  • Some descriptive effort exists
  • Still requires external interpretation
  • Common transitional state

Case 2: “Auto-generated metadata only”

Not met

Rationale:

  • Automation without semantics is insufficient
  • Self-description must include intent, not just structure

Case 3: “Rich product page with business narrative”

Met

Rationale:

  • Product explains itself
  • Consumers can decide independently
  • Meetings become optional, not required

Evidence Signals an Agent Should Look For

Authoritative evidence:

  • Product-level description written in business terms
  • Defined entities/events/measures with explanations
  • Lineage summary linked to the product

Supporting evidence:

  • Glossary term links
  • Quality/freshness statements
  • Usage examples

Red flags:

  • Documentation focused on pipelines or tables
  • Acronyms without definitions
  • Metadata copied verbatim from systems

How an AI Agent Should Decide

Decision rule (simplified):

If a competent consumer cannot understand what the data represents and how it should be used without talking to the producer, DPCH05 is not met.


Why DPCH05 Is Non-Negotiable

Without DPCH05:

  • discoverability leads to confusion
  • reuse creates risk
  • governance cannot scale
  • AI consumption becomes dangerous

Self-description is what turns data into a trustworthy product, not just an exposed asset.


Canonical Statement (for BPS)

DPCH05 is satisfied only when a Data Product embeds sufficient business meaning, structure explanation, lineage context, and usage guidance to be understood and evaluated independently by consumers.