data
5 concepts in this domain
-
Annotation
ProcessThe process of adding labels, tags, or metadata to data to make it suitable for supervised learning or evaluation. For LLMs, annotation includes labeling text for classification, rating response quali...
Also: Data Labeling, Labeling, Tagging
-
Dataset
ArtifactA structured collection of data used for training, fine-tuning, or evaluating AI models. Datasets can contain text, images, labels, or structured records. In the LLM context, datasets are used for pre...
Also: Data Set, Corpus, Data Collection
-
Human Feedback
ArtifactData capturing human judgments about AI outputs—preferences between responses, quality ratings, safety flags, or corrections. Human feedback is the key input for RLHF and related alignment techniques....
Also: Human Preference Data, Preference Labels, Human Evaluation
-
Synthetic Data
ArtifactData generated by AI models rather than collected from real-world sources. Synthetic data is increasingly used to train and fine-tune LLMs, especially when real data is scarce, expensive, or raises pr...
Also: Generated Data, AI-Generated Data, Artificial Data
-
Training Data
ArtifactThe text corpus used to train an LLM's weights during pre-training. Training data determines what the model "knows"—its vocabulary, facts, reasoning patterns, and biases all come from the training cor...
Also: Pre-training Data, Training Corpus