Master Python, Statistics, and ML from zero to advanced with our comprehensive course.
0% Complete
By the end of this course, you'll be able to:
For the best learning experience, we recommend:
Practical Tip: Work in a virtual environment (venv/conda) and use Jupyter Notebook/JupyterLab or VS Code for interactive development.
Objective: Understand the difference between Data Science (complete flow: collection → cleaning → analysis → deployment) and Machine Learning (models that learn from data).
Simple Explanation:
Practical Example (high level):
Problem: predict real estate prices.
Flow: collect ads → clean columns (area, bedrooms, neighborhood) → explore relationship between area and price → train regression → evaluate error → explain relevant variables.
Objective: Learn basic Python concepts needed to manipulate data: types, lists, dictionaries, functions, packages and script/notebook execution.
Explanation:
Practical Example (code):
# simple column sum with lists
areas = [50, 75, 100]
prices = [150000, 200000, 300000]
# calculate price per m2
ppms = [p/a for p,a in zip(prices, areas)]
print(ppms) # [3000.0, 2666.666..., 3000.0]
Objective: Understand central measures (mean, median, mode), dispersion (std, variance, IQR) and basic visualizations (histogram, boxplot, scatter).
Explanation:
Practical Example (pandas + matplotlib):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('imoveis.csv') # columns: area, preco
print(df['preco'].mean(), df['preco'].median())
df['preco'].hist()
plt.title('Distribution of prices')
plt.show()
Objective: Learn reading, selection, filtering, aggregation, joins and missing value handling with pandas.
Key operations:
Practical Example (join & aggregation):
imoveis = pd.read_csv('imoveis.csv') # id_imovel, bairro, area, preco
bairros = pd.read_csv('bairros.csv') # bairro, renda_media
df = imoveis.merge(bairros, on='bairro', how='left')
agg = df.groupby('bairro').agg({'preco':'mean','area':'median'}).reset_index()
Objective: Present essential concepts of vectors, matrices, dot product and derivatives — enough to understand ML algorithms.
Explanation:
Practical Example (linear regression):
Linear regression: y = w0 + w1*x1 + w2*x2 — optimization (least squares) finds w that minimizes sum((y - ŷ)^2).
Objective: Learn techniques to prepare data for models: missing handling, categorical encoding, scaling and creating new features.
Explanation:
Practical Example (scikit-learn pipeline):
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
num_cols = ['area','idade']
cat_cols = ['bairro','tipo']
num_pipe = Pipeline([('impute', SimpleImputer(strategy='median')),
('scale', StandardScaler())])
cat_pipe = Pipeline([('impute', SimpleImputer(strategy='most_frequent')),
('ohe', OneHotEncoder(handle_unknown='ignore'))])
preproc = ColumnTransformer([('num', num_pipe, num_cols),
('cat', cat_pipe, cat_cols)])
Objective: Understand and apply linear regression, logistic regression, decision trees and k-NN.
Explanation:
Practical Example (scikit-learn — linear regression):
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
X = df[['area','idade_do_imovel']]
y = df['preco']
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=42)
model = LinearRegression().fit(X_train, y_train)
pred = model.predict(X_test)
print("RMSE:", mean_squared_error(y_test, pred, squared=False))
Objective: Learn K-means, DBSCAN and PCA for dimensionality reduction and exploration.
Explanation:
Practical Example (K-means + PCA):
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
X = df[['area','preco','renda_media']]
pca = PCA(n_components=2).fit_transform(X)
kmeans = KMeans(n_clusters=3, random_state=42).fit(pca)
# plot pca with colors by cluster
Objective: Learn metrics (RMSE, MAE, AUC, F1), cross-validation and hyperparameter tuning.
Explanation:
Practical Example (GridSearchCV):
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor
param_grid = {'n_estimators':[50,100], 'max_depth':[None,10,20]}
gs = GridSearchCV(RandomForestRegressor(random_state=42), param_grid, cv=5, scoring='neg_root_mean_squared_error')
gs.fit(X_train, y_train)
print(gs.best_params_, gs.best_score_)
Objective: Build reproducible pipelines, save models and apply transformations consistently.
Explanation:
Practical Example (save & load):
from joblib import dump, load
pipeline = Pipeline([('preproc', preproc), ('model', RandomForestRegressor())])
pipeline.fit(X_train, y_train)
dump(pipeline, 'modelo.joblib')
# in production
model = load('modelo.joblib')
preds = model.predict(X_new)
Objective: Understand neurons, layers, loss, backpropagation and train a basic network with PyTorch or TensorFlow.
Explanation:
Practical Example (PyTorch):
import torch
import torch.nn as nn
import torch.optim as optim
class SimpleNet(nn.Module):
def __init__(self, in_dim):
super().__init__()
self.fc = nn.Sequential(
nn.Linear(in_dim,64),
nn.ReLU(),
nn.Linear(64,2)
)
def forward(self,x):
return self.fc(x)
model = SimpleNet(in_dim=10)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
Objective: Understand bagging and boosting and apply XGBoost/LightGBM for tabular data.
Explanation:
Practical Example (XGBoost):
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=42)
model = xgb.XGBRegressor(n_estimators=200, learning_rate=0.05, max_depth=6)
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], early_stopping_rounds=10, verbose=False)
pred = model.predict(X_test)
print("RMSE:", mean_squared_error(y_test, pred, squared=False))
Objective: Handle temporal data: decomposition, seasonality, ARIMA/SARIMA, Prophet and neural approaches (LSTM/Transformers).
Explanation:
Practical Example (decomposition):
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(series, model='additive', period=12)
result.plot()
Objective: Learn ways to put models into production, version, monitor and keep performance stable.
Explanation:
Practical Example (FastAPI + joblib):
from fastapi import FastAPI
from joblib import load
import pandas as pd
app = FastAPI()
model = load('modelo.joblib')
@app.post('/predict')
def predict(data: dict):
df = pd.DataFrame([data])
pred = model.predict(df)[0]
return {'prediction': float(pred)}
Objective: Learn hypothesis testing, p-value, confidence intervals, power analysis and basics of Bayesian inference.
Explanation:
Practical Example (t-test):
from scipy.stats import ttest_ind
stat, p = ttest_ind(group_a['conversao'], group_b['conversao'])
print('p-value:', p)
Summary: You covered statistics, data manipulation, visualization, classical models, preprocessing, pipelines, deep learning and MLOps. Regular practice and real projects consolidate learning.