// the datavedas project

Data Science, Structured.

DataVedas is an ongoing effort to map the world of data science by bringing together concepts from mathematics, statistics, programming, machine learning, and AI in a structured and connected way.

Raw Data

a single observation

// why datavedas

A modern interpretation of an old idea.

Data science knowledge is spread across books, research papers, tutorials, documentation, and industry practice. Inspired by how Vyasa compiled the Vedas, DataVedas brings these ideas together into a connected structure that can be explored as a whole.

// from collection to connection

// the path

Building Knowledge, One Layer at a Time

Each section combines theory with practical implementation in Python, R, and other tools. The sequence reflects how understanding develops in practice, with each section building upon concepts introduced before it.

Basic Statistics

The concepts you must grasp before playing with data.

know more 02

Data Exploration & Prep

Explore, clean and shape data into model-ready form.

know more 03

Modeling

Algorithms explained, with code to run them in Python & R.

know more 04

Evaluation & Validation

Measure whether a model is actually any good.

know more

// theory

Understand the "why".

Before learning how a technique is implemented, it helps to understand why it exists. DataVedas introduces concepts through intuition, examples, visual explanations, and mathematical reasoning before moving to code.

Mean → average value
Median → middle observation
Mode → most frequent value
Outlier shifts mean →

// theory, then application

Then run it.

Once the underlying ideas are clear, each concept is implemented using Python, R, and other tools through practical examples and runnable code.

See a code page

descriptive_stats.py

# central tendency in three lines
import pandas as pd
df = pd.read_excel("diamonds.xls")

df["price"].mean()      # 3932.8
df["price"].median()    # 2401.0
df["price"].mode()[0]   # 605

// author's pick

Favourite reads

basic stats

Measures of Central Tendency

Mean, median and mode, and when each one actually describes your data best.

inferential

Hypothesis Testing

Null hypotheses, p-values and rejection regions, without the jargon.

modeling

Linear Regression

Fitting a straight line to quantify how one variable moves with another.

modeling

Principal Component Analysis

Compressing many correlated features into the few directions that matter.

evaluation

K-Fold Cross-Validation

Splitting data into k folds to estimate how well a model generalizes.

python

Descriptive Statistics in Python

Summarizing a real dataset with pandas, from loading it in to plotting.

// where to begin

Start at the foundation.

DataVedas is designed to be explored in sequence, with each section building upon the last. If you're unsure where to begin, start at the foundation and follow the path forward.

Explore the star map