Document Purpose: Clarify the fundamental distinction between metadata (taxonomies) and data (instances) in XBRL, and how file extensions and formats work
Last Updated: January 2026
Target Audience: XBRL developers and anyone confused by XBRL's architecture
The Most Important Concept in XBRL:
XBRL separates metadata (taxonomies) from data (instances) in a way that confuses many people because it differs from traditional XML architecture.
Critical Distinction:
| Format | Metadata (Taxonomy) | Data (Instance) |
|---|---|---|
| Traditional XML | .xsd schema file |
.xml instance document |
| XBRL | .xsd + .xml linkbases |
.xbrl instance document |
| xBRL-JSON | .xsd + .xml linkbases |
.json instance document |
| xBRL-CSV | .xsd + .xml linkbases |
.json metadata + .csv tables |
The Confusion:
In traditional XML, .xml = data and .xsd = metadata.
In XBRL, both .xsd and .xml are metadata (taxonomy), while .xbrl = data.
This document explains why, how it works, and how to think about it correctly.
In traditional XML, there's a clear separation:
<!-- customer-data.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<customer>
<id>12345</id>
<name>John Doe</name>
<email>john@example.com</email>
<age>35</age>
</customer>
Purpose: Contains actual data values
Extension: .xml
Role: Instance document
<!-- customer-schema.xsd -->
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="customer">
<xs:complexType>
<xs:sequence>
<xs:element name="id" type="xs:string"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Purpose: Defines structure and data types
Extension: .xsd
Role: Schema (metadata)
┌─────────────────────────────────────┐
│ customer-schema.xsd │
│ (Metadata/Schema) │
│ │
│ Defines: structure, types, rules │
└─────────────────────────────────────┘
│
│ validates
▼
┌─────────────────────────────────────┐
│ customer-data.xml │
│ (Data/Instance) │
│ │
│ Contains: actual customer data │
└─────────────────────────────────────┘
Simple Rule:
.xsd = Metadata (what CAN be).xml = Data (what IS)This is intuitive and easy to understand.
XBRL needs to express much more than traditional XML schemas:
Traditional XML Schema can express:
XBRL Taxonomies need to express:
Solution: XBRL uses XML Schema (.xsd) for basic structure, then adds separate linkbase files (.xml) for relationships and labels.
┌──────────────────────────────────────────────────────────────┐
│ XBRL TAXONOMY │
│ (METADATA) │
├──────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ concepts.xsd │ │ labels.xml │ │
│ │ (Schema) │◄────────┤ (Linkbase) │ │
│ │ │ │ │ │
│ │ Defines: │ │ Defines: │ │
│ │ - Element names │ │ - Human labels │ │
│ │ - Data types │ │ - Descriptions │ │
│ │ - Abstract items │ │ - Multiple langs │ │
│ └────────────────────┘ └────────────────────┘ │
│ │ │ │
│ │ │ │
│ │ ┌────────────────────┴──────┐ │
│ │ │ │ │
│ │ ┌────────────────────┐ ┌──────────────┐ │
│ │ │ calculation.xml │ │ presentation│ │
│ └───►│ (Linkbase) │ │ .xml │ │
│ │ │ │ (Linkbase) │ │
│ │ Defines: │ │ │ │
│ │ - Summations │ │ Defines: │ │
│ │ - Relationships │ │ - Display │ │
│ └────────────────────┘ │ - Hierarchy │ │
│ └──────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘
│
│ defines structure for
▼
┌──────────────────────────────────────────────────────────────┐
│ company-report.xbrl │
│ (INSTANCE - DATA) │
│ │
│ Contains: actual financial data, facts, contexts, units │
└──────────────────────────────────────────────────────────────┘
The Key Point:
.xsd + .xml linkbases = Metadata (taxonomy).xbrl = Data (instance)1. Schema Files (.xsd)
<!-- us-gaap-2024.xsd -->
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xbrli="http://www.xbrl.org/2003/instance"
targetNamespace="http://fasb.org/us-gaap/2024">
<!-- Concept definitions -->
<xs:element name="Revenue"
type="xbrli:monetaryItemType"
substitutionGroup="xbrli:item"
xbrli:periodType="duration"/>
<xs:element name="Assets"
type="xbrli:monetaryItemType"
substitutionGroup="xbrli:item"
xbrli:periodType="instant"/>
</xs:schema>
Purpose: Define concept names and data types
Extension: .xsd
Role: Part of taxonomy (metadata)
2. Linkbase Files (.xml)
Label Linkbase (.xml)
<!-- us-gaap-2024-label.xml -->
<link:linkbase
xmlns:link="http://www.xbrl.org/2003/linkbase"
xmlns:xlink="http://www.w3.org/1999/xlink">
<link:labelLink>
<link:loc xlink:type="locator"
xlink:href="us-gaap-2024.xsd#Revenue"
xlink:label="Revenue_loc"/>
<link:label xlink:type="resource"
xlink:label="Revenue_label"
xlink:role="http://www.xbrl.org/2003/role/label"
xml:lang="en">Revenue</link:label>
<link:labelArc xlink:type="arc"
xlink:from="Revenue_loc"
xlink:to="Revenue_label"/>
</link:labelLink>
</link:linkbase>
Purpose: Human-readable labels, documentation
Extension: .xml
Role: Part of taxonomy (metadata)
Calculation Linkbase (.xml)
<!-- us-gaap-2024-calculation.xml -->
<link:linkbase
xmlns:link="http://www.xbrl.org/2003/linkbase">
<link:calculationLink>
<!-- Revenue = Product Sales + Service Revenue -->
<link:loc xlink:href="us-gaap-2024.xsd#Revenue"
xlink:label="Revenue_loc"/>
<link:loc xlink:href="us-gaap-2024.xsd#ProductSales"
xlink:label="ProductSales_loc"/>
<link:loc xlink:href="us-gaap-2024.xsd#ServiceRevenue"
xlink:label="ServiceRevenue_loc"/>
<link:calculationArc xlink:from="Revenue_loc"
xlink:to="ProductSales_loc"
order="1" weight="1.0"/>
<link:calculationArc xlink:from="Revenue_loc"
xlink:to="ServiceRevenue_loc"
order="2" weight="1.0"/>
</link:calculationLink>
</link:linkbase>
Purpose: Define calculation relationships
Extension: .xml
Role: Part of taxonomy (metadata)
<!-- acme-2024-q4.xbrl -->
<?xml version="1.0" encoding="UTF-8"?>
<xbrl xmlns="http://www.xbrl.org/2003/instance"
xmlns:us-gaap="http://fasb.org/us-gaap/2024"
xmlns:link="http://www.xbrl.org/2003/linkbase"
xmlns:xlink="http://www.w3.org/1999/xlink">
<!-- Schema reference (links to taxonomy) -->
<link:schemaRef xlink:type="simple"
xlink:href="http://fasb.org/us-gaap/2024/us-gaap-2024.xsd"/>
<!-- Context -->
<context id="FY2024">
<entity>
<identifier scheme="http://www.sec.gov/CIK">0001234567</identifier>
</entity>
<period>
<startDate>2024-01-01</startDate>
<endDate>2024-12-31</endDate>
</period>
</context>
<!-- Unit -->
<unit id="USD">
<measure>iso4217:USD</measure>
</unit>
<!-- Facts (actual data) -->
<us-gaap:Revenue contextRef="FY2024" unitRef="USD" decimals="0">
1000000
</us-gaap:Revenue>
<us-gaap:ProductSales contextRef="FY2024" unitRef="USD" decimals="0">
700000
</us-gaap:ProductSales>
<us-gaap:ServiceRevenue contextRef="FY2024" unitRef="USD" decimals="0">
300000
</us-gaap:ServiceRevenue>
</xbrl>
Purpose: Contains actual financial data
Extension: .xbrl
Role: Instance document (data)
TRADITIONAL XML:
═══════════════
.xsd → Metadata
.xml → Data
XBRL:
═════
.xsd + .xml → Metadata (taxonomy)
.xbrl → Data (instance)
Why .xml for linkbases?
Problem: How do you express relationships between concepts?
For example:
Solution: Use XLink to create links between:
<!-- Example: Linking a concept to its label -->
<!-- Step 1: Locate the concept -->
<link:loc xlink:type="locator"
xlink:href="us-gaap-2024.xsd#Revenue"
xlink:label="Revenue_loc"/>
<!-- Step 2: Define the label resource -->
<link:label xlink:type="resource"
xlink:label="Revenue_label"
xml:lang="en">Revenue</link:label>
<!-- Step 3: Create arc connecting them -->
<link:labelArc xlink:type="arc"
xlink:from="Revenue_loc"
xlink:to="Revenue_label"/>
XLink Elements:
xlink:type="locator" - Points to a conceptxlink:type="resource" - Contains information (label, doc, etc.)xlink:type="arc" - Creates relationship between locator and resource<!-- Reference to concept in schema -->
xlink:href="us-gaap-2024.xsd#Revenue"
↑ ↑
schema file concept ID (fragment identifier)
XPointer is the #Revenue part - it identifies a specific element within the XML document.
<link:calculationLink>
<!-- Parent concept: Total Assets -->
<link:loc xlink:href="schema.xsd#Assets"
xlink:label="Assets_loc"/>
<!-- Child concept: Current Assets -->
<link:loc xlink:href="schema.xsd#CurrentAssets"
xlink:label="CurrentAssets_loc"/>
<!-- Child concept: Non-Current Assets -->
<link:loc xlink:href="schema.xsd#NonCurrentAssets"
xlink:label="NonCurrentAssets_loc"/>
<!-- Relationship: Assets = CurrentAssets + NonCurrentAssets -->
<link:calculationArc
xlink:from="Assets_loc"
xlink:to="CurrentAssets_loc"
order="1.0"
weight="1.0"/> <!-- +1 means addition -->
<link:calculationArc
xlink:from="Assets_loc"
xlink:to="NonCurrentAssets_loc"
order="2.0"
weight="1.0"/> <!-- +1 means addition -->
</link:calculationLink>
This expresses: Assets = CurrentAssets + NonCurrentAssets
Without XLink/XPointer, we couldn't:
Taxonomy (Metadata):
us-gaap-2024/
├── us-gaap-2024.xsd ← Schema (concepts)
├── us-gaap-2024-label.xml ← Labels
├── us-gaap-2024-presentation.xml ← Display structure
├── us-gaap-2024-calculation.xml ← Summations
└── us-gaap-2024-definition.xml ← Dimensions
Instance (Data):
company-report-2024.xbrl ← Financial data
File Extensions:
.xsd + .xml.xbrlTaxonomy (Metadata):
Same as xBRL-XML:
├── schema.xsd
├── label.xml
├── presentation.xml
└── calculation.xml
Instance (Data):
<!-- annual-report-2024.html or .xhtml -->
<!DOCTYPE html>
<html xmlns:ix="http://www.xbrl.org/2013/inlineXBRL">
<head>
<title>Annual Report 2024</title>
</head>
<body>
<h1>Financial Highlights</h1>
<p>
Revenue for 2024 was
<ix:nonFraction name="us-gaap:Revenue"
contextRef="FY2024"
unitRef="USD"
decimals="0"
format="ixt:numdotdecimal">1,000,000</ix:nonFraction>
dollars.
</p>
<!-- Hidden facts -->
<ix:hidden>
<ix:nonFraction name="us-gaap:ProductSales"
contextRef="FY2024"
unitRef="USD"
decimals="0">700000</ix:nonFraction>
</ix:hidden>
</body>
</html>
File Extensions:
.xsd + .xml (same as xBRL-XML).html or .xhtml (human-readable)Key Point: Inline XBRL instances are .html files, but they reference the same .xsd + .xml taxonomy files!
Taxonomy (Metadata):
Same as xBRL-XML:
├── schema.xsd
├── label.xml
├── presentation.xml
└── calculation.xml
Instance (Data):
{
"documentInfo": {
"documentType": "https://xbrl.org/2021/xbrl-json",
"taxonomy": ["http://fasb.org/us-gaap/2024/us-gaap-2024.xsd"],
"namespaces": {
"us-gaap": "http://fasb.org/us-gaap/2024"
}
},
"facts": {
"us-gaap:Revenue": {
"dimensions": {},
"unit": "iso4217:USD",
"value": 1000000
},
"us-gaap:ProductSales": {
"dimensions": {},
"unit": "iso4217:USD",
"value": 700000
},
"us-gaap:ServiceRevenue": {
"dimensions": {},
"unit": "iso4217:USD",
"value": 300000
}
}
}
File Extensions:
.xsd + .xml (STILL the same!).json (data in JSON format)Critical Point: Even though the instance is JSON, the taxonomy is STILL .xsd + .xml files using XLink!
Taxonomy (Metadata):
Same as xBRL-XML:
├── schema.xsd
├── label.xml
├── presentation.xml
└── calculation.xml
Instance (Data):
CSV Metadata (.json):
{
"documentInfo": {
"documentType": "https://xbrl.org/2021/xbrl-csv",
"taxonomy": ["http://fasb.org/us-gaap/2024/us-gaap-2024.xsd"],
"namespaces": {
"us-gaap": "http://fasb.org/us-gaap/2024"
}
},
"tableTemplates": {
"revenue-table": {
"columns": {
"concept": {},
"period": {},
"value": { "$c": "us-gaap:value" }
}
}
},
"tables": {
"revenue-data": {
"template": "revenue-table",
"url": "revenue-data.csv"
}
}
}
CSV Data (.csv):
concept,period,value
us-gaap:Revenue,2024,1000000
us-gaap:ProductSales,2024,700000
us-gaap:ServiceRevenue,2024,300000
File Extensions:
.xsd + .xml (STILL the same!).json (metadata) + .csv (data tables)JSON Schema (Metadata):
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"id": { "type": "string" },
"name": { "type": "string" },
"age": { "type": "integer", "minimum": 0 }
},
"required": ["id", "name"]
}
JSON Instance (Data):
{
"id": "12345",
"name": "John Doe",
"age": 35
}
File Extensions:
.json (JSON Schema).json (JSON instance)Note: Both use .json extension - you must distinguish by content or naming convention.
CSV Metadata (Table Schema):
{
"@context": "http://www.w3.org/ns/csvw",
"url": "customer-data.csv",
"tableSchema": {
"columns": [
{
"name": "id",
"datatype": "string",
"required": true
},
{
"name": "name",
"datatype": "string"
},
{
"name": "age",
"datatype": "integer"
}
]
}
}
CSV Data:
id,name,age
12345,John Doe,35
12346,Jane Smith,28
File Extensions:
.json (CSV metadata/schema).csv (CSV file)REGULAR JSON:
═════════════
Metadata: JSON Schema (.json file)
Data: JSON document (.json file)
└─ Both are .json, distinguished by content
xBRL-JSON:
══════════
Metadata: XBRL Taxonomy (.xsd + .xml files)
Data: JSON document (.json file)
└─ Taxonomy uses XML Schema + XLink
└─ Instance uses JSON format
Key Difference: xBRL-JSON uses traditional XBRL taxonomy architecture (XSD + XLink), but instances are JSON.
REGULAR CSV:
════════════
Metadata: Table Schema (.json file)
Data: CSV file (.csv file)
xBRL-CSV:
═════════
Metadata: XBRL Taxonomy (.xsd + .xml files)
Data: CSV metadata (.json) + CSV files (.csv)
└─ Taxonomy uses XML Schema + XLink
└─ Instance uses JSON metadata + CSV data
Key Difference: xBRL-CSV uses traditional XBRL taxonomy architecture for concept definitions, but data is in CSV format.
Problem: Different formats (XML, JSON, CSV) but same taxonomy.
Solution: OIM provides an abstract model that all formats map to.
┌──────────────────────┐
│ XBRL Taxonomy │
│ (.xsd + .xml files) │
│ │
│ Defines: concepts, │
│ labels, rules │
└──────────────────────┘
│
┌─────────┴─────────┐
│ OIM Abstract │
│ Data Model │
│ │
│ Facts, Dimensions│
│ Aspects, Values │
└───────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ xBRL-XML │ │ xBRL-JSON │ │ xBRL-CSV │
│ (.xbrl) │ │ (.json) │ │ (.json + .csv)│
└────────────────┘ └────────────────┘ └────────────────┘
Key Insight:
Fact: A single piece of data
Dimensions: Context information (entity, period, scenario, dimensions)
Aspects: Properties of a fact (concept, dimensions, unit)
Example Fact:
Concept: us-gaap:Revenue
Entity: Acme Corp (CIK 0001234567)
Period: 2024-01-01 to 2024-12-31
Unit: USD
Value: 1,000,000
Same fact in different formats:
xBRL-XML:
<us-gaap:Revenue contextRef="FY2024" unitRef="USD" decimals="0">
1000000
</us-gaap:Revenue>
xBRL-JSON:
"us-gaap:Revenue": {
"dimensions": {
"entity": "http://www.sec.gov/CIK 0001234567",
"period": "2024-01-01/2024-12-31"
},
"unit": "iso4217:USD",
"value": 1000000
}
xBRL-CSV:
concept,entity,period,unit,value
us-gaap:Revenue,0001234567,2024,USD,1000000
All three represent the SAME OIM fact, using the SAME taxonomy!
Wrong: "The company sent me their financial report as an .xml file, so it's an XBRL instance."
Right: In XBRL, .xml files are usually linkbases (metadata). Instances should be .xbrl files.
Caveat: Some tools save instances as .xml files anyway (poor practice but common). Check the root element:
<xbrl> or <xbrli:xbrl> → Instance (data)<linkbase> → Linkbase (metadata)<schema> → Schema (metadata)Wrong: "xBRL-JSON instances need a JSON Schema file."
Right: xBRL-JSON instances use the traditional XBRL taxonomy (.xsd + .xml files), not JSON Schema. The taxonomy is the same regardless of instance format.
Wrong: "This label.xml file contains data about labels, so it's a data file."
Right: Linkbases contain metadata ABOUT concepts. They're not instance documents. They're part of the taxonomy that defines what concepts mean and how they relate.
Wrong: "Taxonomy files have .xsd extension, so any .xsd file is a taxonomy."
Right: While taxonomy schemas are .xsd files, you need BOTH .xsd AND .xml linkbase files to have a complete taxonomy. The .xsd alone is incomplete.
Wrong: "I need a different taxonomy for JSON vs XML instances."
Right: The taxonomy (.xsd + .xml) is the SAME regardless of instance format. Only the instance format changes.
Step 1: Check file extension
│
├─ .xbrl
│ └─ Instance document (data)
│ Format: xBRL-XML
│
├─ .json
│ ├─ Check content
│ │ ├─ Has "documentType": "...xbrl-json"
│ │ │ └─ Instance document (data)
│ │ │ Format: xBRL-JSON
│ │ │
│ │ └─ Has "documentType": "...xbrl-csv"
│ │ └─ CSV metadata (part of instance)
│ │ Format: xBRL-CSV
│ │
│ └─ Has "$schema"
│ └─ JSON Schema (not XBRL)
│
├─ .csv
│ └─ CSV data (part of xBRL-CSV instance)
│
├─ .xsd
│ └─ Taxonomy schema (metadata)
│ Part of taxonomy
│
├─ .xml
│ ├─ Check root element
│ │ ├─ <xbrl> or <xbrli:xbrl>
│ │ │ └─ Instance document (data)
│ │ │ (Poor naming practice, should be .xbrl)
│ │ │
│ │ ├─ <linkbase> or <link:linkbase>
│ │ │ └─ Linkbase file (metadata)
│ │ │ Part of taxonomy
│ │ │
│ │ └─ <schema> or <xs:schema>
│ │ └─ Schema file (metadata)
│ │ Part of taxonomy
│ │ (Poor naming practice, should be .xsd)
│
└─ .html or .xhtml
├─ Has xmlns:ix="...inlineXBRL"
│ └─ Inline XBRL instance (data)
│ Format: Inline XBRL
│
└─ Regular HTML (not XBRL)
| Extension | Content Type | Metadata or Data? | Notes |
|---|---|---|---|
.xbrl |
xBRL-XML instance | Data | Proper extension for instances |
.json |
xBRL-JSON or xBRL-CSV metadata | Data | Check documentType |
.csv |
xBRL-CSV tables | Data | Always paired with .json |
.xsd |
Taxonomy schema | Metadata | Defines concepts and types |
.xml |
Linkbase | Metadata | Usually linkbase; check root element |
.html .xhtml |
Inline XBRL | Data | Human-readable instances |
us-gaap-2024/
│
├── METADATA (Taxonomy):
│ ├── us-gaap-2024.xsd ← Concept definitions
│ ├── us-gaap-2024-label.xml ← English labels
│ ├── us-gaap-2024-label-es.xml ← Spanish labels
│ ├── us-gaap-2024-presentation.xml ← Display structure
│ ├── us-gaap-2024-calculation.xml ← Summation rules
│ ├── us-gaap-2024-definition.xml ← Dimensional structure
│ └── us-gaap-2024-reference.xml ← Accounting standards references
│
└── DATA (Instance) - separate files:
├── acme-corp-2024-q1.xbrl ← Q1 data (xBRL-XML)
├── acme-corp-2024-q2.html ← Q2 data (Inline XBRL)
├── acme-corp-2024-q3.json ← Q3 data (xBRL-JSON)
└── acme-corp-2024-annual/
├── metadata.json ← Annual data (xBRL-CSV)
└── financial-data.csv
All instances use the SAME taxonomy (us-gaap-2024/)
acme-extension-2024/
│
├── METADATA (Extension Taxonomy):
│ ├── acme-extension-2024.xsd ← Custom concepts
│ ├── acme-extension-2024-label.xml ← Labels for custom concepts
│ ├── acme-extension-2024-presentation.xml
│ └── acme-extension-2024-calculation.xml
│
└── References US GAAP:
└── Import: http://fasb.org/us-gaap/2024/us-gaap-2024.xsd
DATA (Instance):
└── acme-annual-report-2024.xbrl
├── References: acme-extension-2024.xsd
│ └── Which imports: us-gaap-2024.xsd
└── Contains facts using both:
├── Standard US GAAP concepts
└── Acme-specific extension concepts
sec-filing-2024-q4.zip
│
├── reports/
│ └── acme-10k-2024.html ← DATA (Inline XBRL instance)
│
└── taxonomy/
├── acme-extension/
│ ├── acme-2024.xsd ← METADATA (Extension taxonomy)
│ ├── acme-2024-label.xml
│ ├── acme-2024-presentation.xml
│ └── acme-2024-calculation.xml
│
└── us-gaap-2024/ ← METADATA (Base taxonomy)
├── us-gaap-2024.xsd
├── us-gaap-2024-label.xml
└── ...
Instance: acme-10k-2024.html (data)
Taxonomy: Everything in taxonomy/ directory (metadata)
public class XBRLFileIdentifier {
public enum XBRLFileType {
INSTANCE_XBRL_XML,
INSTANCE_INLINE_XBRL,
INSTANCE_XBRL_JSON,
INSTANCE_XBRL_CSV_METADATA,
INSTANCE_XBRL_CSV_DATA,
TAXONOMY_SCHEMA,
TAXONOMY_LINKBASE,
UNKNOWN
}
public static XBRLFileType identifyFile(File file)
throws IOException {
String filename = file.getName().toLowerCase();
// Check by extension first
if (filename.endsWith(".xbrl")) {
return XBRLFileType.INSTANCE_XBRL_XML;
}
if (filename.endsWith(".xsd")) {
return XBRLFileType.TAXONOMY_SCHEMA;
}
if (filename.endsWith(".csv")) {
return XBRLFileType.INSTANCE_XBRL_CSV_DATA;
}
if (filename.endsWith(".json")) {
return identifyJSONFile(file);
}
if (filename.endsWith(".xml")) {
return identifyXMLFile(file);
}
if (filename.endsWith(".html") || filename.endsWith(".xhtml")) {
return identifyHTMLFile(file);
}
return XBRLFileType.UNKNOWN;
}
private static XBRLFileType identifyJSONFile(File file)
throws IOException {
// Read first part of JSON
String content = Files.readString(file.toPath(),
StandardCharsets.UTF_8);
// Check for xBRL-JSON marker
if (content.contains("\"documentType\"") &&
content.contains("xbrl-json")) {
return XBRLFileType.INSTANCE_XBRL_JSON;
}
// Check for xBRL-CSV marker
if (content.contains("\"documentType\"") &&
content.contains("xbrl-csv")) {
return XBRLFileType.INSTANCE_XBRL_CSV_METADATA;
}
return XBRLFileType.UNKNOWN;
}
private static XBRLFileType identifyXMLFile(File file)
throws Exception {
// Parse XML to check root element
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(file);
Element root = doc.getDocumentElement();
String rootName = root.getLocalName();
String namespace = root.getNamespaceURI();
// Check for XBRL instance
if ("xbrl".equals(rootName) &&
namespace.contains("xbrl.org/2003/instance")) {
return XBRLFileType.INSTANCE_XBRL_XML;
}
// Check for linkbase
if ("linkbase".equals(rootName) &&
namespace.contains("xbrl.org/2003/linkbase")) {
return XBRLFileType.TAXONOMY_LINKBASE;
}
// Check for schema
if ("schema".equals(rootName) &&
namespace.equals("http://www.w3.org/2001/XMLSchema")) {
return XBRLFileType.TAXONOMY_SCHEMA;
}
return XBRLFileType.UNKNOWN;
}
private static XBRLFileType identifyHTMLFile(File file)
throws IOException {
// Read HTML content
String content = Files.readString(file.toPath(),
StandardCharsets.UTF_8);
// Check for Inline XBRL namespace
if (content.contains("xmlns:ix=") &&
content.contains("inlineXBRL")) {
return XBRLFileType.INSTANCE_INLINE_XBRL;
}
return XBRLFileType.UNKNOWN;
}
public static boolean isInstanceDocument(File file)
throws IOException {
XBRLFileType type = identifyFile(file);
return type == XBRLFileType.INSTANCE_XBRL_XML ||
type == XBRLFileType.INSTANCE_INLINE_XBRL ||
type == XBRLFileType.INSTANCE_XBRL_JSON ||
type == XBRLFileType.INSTANCE_XBRL_CSV_METADATA;
}
public static boolean isTaxonomyDocument(File file)
throws IOException {
XBRLFileType type = identifyFile(file);
return type == XBRLFileType.TAXONOMY_SCHEMA ||
type == XBRLFileType.TAXONOMY_LINKBASE;
}
}
public class XBRLProcessor {
private Taxonomy taxonomy;
private Instance instance;
public void loadFiles(List<File> files) throws Exception {
// Separate files by type
List<File> taxonomyFiles = new ArrayList<>();
List<File> instanceFiles = new ArrayList<>();
for (File file : files) {
if (XBRLFileIdentifier.isTaxonomyDocument(file)) {
taxonomyFiles.add(file);
} else if (XBRLFileIdentifier.isInstanceDocument(file)) {
instanceFiles.add(file);
}
}
// Load taxonomy first (metadata)
System.out.println("Loading taxonomy (metadata)...");
taxonomy = new Taxonomy();
for (File taxFile : taxonomyFiles) {
XBRLFileType type = XBRLFileIdentifier.identifyFile(taxFile);
if (type == XBRLFileType.TAXONOMY_SCHEMA) {
System.out.println(" Loading schema: " + taxFile.getName());
taxonomy.loadSchema(taxFile);
} else if (type == XBRLFileType.TAXONOMY_LINKBASE) {
System.out.println(" Loading linkbase: " + taxFile.getName());
taxonomy.loadLinkbase(taxFile);
}
}
// Load instance (data)
System.out.println("Loading instance (data)...");
for (File instFile : instanceFiles) {
XBRLFileType type = XBRLFileIdentifier.identifyFile(instFile);
switch (type) {
case INSTANCE_XBRL_XML:
System.out.println(" Loading xBRL-XML instance: " +
instFile.getName());
instance = loadXBRLXMLInstance(instFile, taxonomy);
break;
case INSTANCE_INLINE_XBRL:
System.out.println(" Loading Inline XBRL instance: " +
instFile.getName());
instance = loadInlineXBRLInstance(instFile, taxonomy);
break;
case INSTANCE_XBRL_JSON:
System.out.println(" Loading xBRL-JSON instance: " +
instFile.getName());
instance = loadXBRLJSONInstance(instFile, taxonomy);
break;
case INSTANCE_XBRL_CSV_METADATA:
System.out.println(" Loading xBRL-CSV instance: " +
instFile.getName());
instance = loadXBRLCSVInstance(instFile, taxonomy);
break;
}
}
System.out.println("Loading complete.");
System.out.println("Taxonomy concepts: " +
taxonomy.getConceptCount());
System.out.println("Instance facts: " +
instance.getFactCount());
}
}
1. Use Proper File Extensions
// GOOD - Clear file extensions
taxonomy/
├── concepts.xsd ← Schema
├── labels.xml ← Linkbase
└── calculations.xml ← Linkbase
instances/
├── report-2024.xbrl ← xBRL-XML instance
├── report-2024.html ← Inline XBRL instance
└── report-2024.json ← xBRL-JSON instance
// BAD - Ambiguous extensions
files/
├── concepts.xml ← Is this linkbase or instance?
├── report.xml ← Is this linkbase or instance?
2. Always Check Content, Not Just Extension
// DON'T rely solely on extension
if (filename.endsWith(".xml")) {
// Is this linkbase or instance? Must check!
}
// DO check content
XBRLFileType type = identifyFile(file);
if (type == XBRLFileType.TAXONOMY_LINKBASE) {
loadLinkbase(file);
} else if (type == XBRLFileType.INSTANCE_XBRL_XML) {
loadInstance(file);
}
3. Document File Roles in Packages
README.md:
This package contains:
TAXONOMY (Metadata):
- my-taxonomy.xsd: Concept definitions
- my-taxonomy-label.xml: English labels
- my-taxonomy-calculation.xml: Summation relationships
INSTANCES (Data):
- Q1-report.xbrl: First quarter data (xBRL-XML format)
- Q2-report.html: Second quarter data (Inline XBRL format)
- Q3-report.json: Third quarter data (xBRL-JSON format)
XBRL separates metadata (what CAN be reported) from data (what IS reported).
Metadata (Taxonomy):
What: Concept definitions, labels, relationships
Format: .xsd (schemas) + .xml (linkbases using XLink)
Changes: Infrequently (yearly)
Data (Instance):
What: Actual facts, values
Format: .xbrl, .html, .json, .csv
Changes: Each reporting period
| File Type | Extension | Role | Uses |
|---|---|---|---|
| XBRL Schema | .xsd |
Metadata | Concept definitions |
| Linkbase | .xml |
Metadata | Labels, relationships via XLink |
| xBRL-XML Instance | .xbrl |
Data | Financial data in XML |
| Inline XBRL | .html/.xhtml |
Data | Financial data in HTML |
| xBRL-JSON | .json |
Data | Financial data in JSON |
| xBRL-CSV Metadata | .json |
Data | CSV structure definition |
| xBRL-CSV Data | .csv |
Data | Tabular financial data |
❌ Don't assume .xml = data (it's usually metadata in XBRL)
❌ Don't assume you need different taxonomies for different formats
❌ Don't assume .xsd files are sufficient alone (need linkbases too)
❌ Don't assume JSON instances use JSON Schema (they use XBRL taxonomies)
✅ Do understand taxonomy = .xsd + .xml linkbases (metadata)
✅ Do understand instances = .xbrl / .html / .json / .csv (data)
✅ Do check file content, not just extension
✅ Do remember XLink/XPointer enable taxonomy relationships
XBRL ECOSYSTEM
═══════════════
┌───────────────────────────────────────┐
│ TAXONOMY (Metadata) │
│ One taxonomy, many uses │
│ │
│ .xsd files → Concept definitions │
│ .xml files → Labels, relationships │
│ (using XLink/XPointer) │
└───────────────────────────────────────┘
│
│ defines structure for
│
┌───────────┼───────────┬───────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ .xbrl │ │ .html │ │ .json │ │.json+.csv│
│ │ │ │ │ │ │ │
│xBRL-XML │ │ Inline │ │xBRL-JSON│ │xBRL-CSV │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
INSTANCES (Data)
Different formats, same taxonomy
The XBRL architecture may seem confusing at first, but it's actually elegant:
.xsd + .xml to define concepts and relationships.xbrl, .html, .json, .csv).xsd+.xml = metadata, others = dataRemember:
.xml = data, .xsd = metadata.xsd + .xml = metadata (taxonomy), .xbrl/.html/.json/.csv = data (instance)Understanding this distinction is fundamental to working with XBRL effectively.
This document clarifies the metadata vs data architecture in XBRL and related formats as of January 2026, addressing the most common source of confusion in XBRL implementation.