Five-Star Data Model for AI

Artificial intelligence thrives on high-quality data. Yet many organisations still keep valuable information buried in formats that frustrate machine learning and analytics. Tim Berners-Lee’s five-star data model, originally created to promote open data, offers a clear, practical roadmap for preparing internal company data so it can fuel AI systems.

One Star: Unstructured and Hard to Mine β˜…

At the starting point are PDFs, reports, and scanned documents. They often hold rich histories and insights, but to AI they are black boxes. Natural language processing and OCR can help, but extracting consistent, reliable features is costly and time-consuming.

Two Stars: Structured but Trapped β˜…β˜…

Excel spreadsheets and similar tables introduce basic structure. Machines can read the cells, but the meaning of columns, units, and relationships remains locked in human context. Data scientists still need to clean and transform it before model training.

Three Stars: Machine-Readable Formats β˜…β˜…β˜…

CSV files or SQL databases provide well-understood formats. AI pipelines can ingest these with minimal friction, yet external documentation is still required to explain business rules, data dictionaries, or schema changes.

Four Stars: Documented APIs β˜…β˜…β˜…β˜…

A documented, stable API marks a major leap forward. AI applications can pull fresh data automatically, confident in its structure and update frequency. This enables real-time analytics, automated retraining, and scalable deployment across teams and products.

Five Stars: Interlinked, Context-Rich Data β˜…β˜…β˜…β˜…β˜…

The gold standard is data that not only flows through APIs but also connects to related datasets. Linked data creates a web of context, customer records tied to transactions, sensors to maintenance logs, or product IDs to global taxonomies. For AI, these relationships enable advanced reasoning, richer feature engineering, and more accurate predictions.

From Documents to Intelligence

Climbing the star ladder is indespenisble to unleashing AI’s true value. Each step reduces manual cleaning and increases the ability to build models that are timely and reliable.

Organisations that invest in moving their data up the scale position themselves to make AI a strategic asset rather than an experimental add-on. The model is only as good as the data it is trained on (or it has access to).