# Frequent Questions

### Bias vs Variance :laughing:&#x20;

**Simple model**, high bias -> lower the bias -> complex model -> overfitting -> higher variance

**Complex model**, high variance -> lower the variance -> simplified model -> underfitting -> higher bias

High order polynomial function is the more complex model.

### KNN :astonished:&#x20;

To determine a new input instance's class/regress value, find the K closest history data points and do a majority vote to predict.

* **Normalize** the features to avoid over/under-estimation for large/small value features.&#x20;
* **KD tree**, grids on multi-dimension space,  to accelerate the K neighbors' finding procedure.
* **Weighted neighbors**, the closer the larger weights would be, to make it more sensitive to similarity.

![与当前最近距离半径为圆相交的边界需要左右均检查，不相交则不需要再检查 the other half](/files/-MXVUs__yijzNVk4KNu5)

### K-means  :face\_with\_monocle:&#x20;

A way to put unlabeled data into K groups where data points in the same group are similar and data points in different groups are far apart.

* Choose K -> random initialize centers -> iteratively recalculate centers with each cluster's distribution.
* &#x20;**Pros**:
  * Easy to understand, guarantee to converge, scalable to large data, relative fast
* **Cons**:
  * manually set K, initial sensitive, outliers sensitive, linear boundaries, O(n) for each step
* **To improve**:
  * verify performance with different K, **select K at the turning point**
  * K-means++, **sequentially select initial centers** in a way that the new center is far away from the previous ones.
  * Pre-processing to **normalize and filter outliers**
  * **Use kernel** to map data points into high dimensions. Then apply linear boundaries there.

### KNN vs K-means :thinking:&#x20;

**Both** rely on measuring distances, Euclidean/Manhattan distance, power-> n, cosine similarity, etc.&#x20;

* KNN has to have labels for each history data sample ahead.&#x20;
* K-means is an unsupervised learning method and doesn't require labels at all.

### Metrics :triumph:&#x20;

**Positive/Negative**: the predicted results; **True/False**: whether the prediction is correct.

**TP, TN**: Correctly predicted positive and correctly predicted negative

**FP, FN**: Wrongly predicted positive and wrongly predicted negative

TP / (TP + FP) -> among all positive predictions, how much correct: **precision**

TP / (TP + FN) -> among all positive groud truths, how much correct: **recall, True positive rate**

FP / (FP + TN) -> among all negative ground truths, how much wrong: **False Positive rate**

**ROC curve:** TPR + FPR

### Bayes' Theorem :cowboy:&#x20;

P(theta | x) = P(x | theta) \* P(theta) / P(x)

**P(x, theta):** P(x | theta) \* P(theta), joint probability&#x20;

**P(x | theta)**: the probability of event B occurring given that A == likelihood of A given B.

**P(theta)**: Priori probability

**P(theta | x)**: Posterior probability&#x20;

**Naivety**: the conditional probability is calculated as the pure product of the individual probabilities of components. This **implies the absolute independence of features** — a condition probably never met in real life.

### L1 L2 Regularization&#x20;

Only the loss term: **empirical** risk minimization

Loss + regularization: **structural** risk minimization

**L1** means theta prior probability is Laplacian, **L2** means theta prior probability is gaussian. <https://www.bilibili.com/video/BV1aE411L7sj?p=6&spm_id_from=pageDriver>

L2 regularization tends to spread the loss on all terms, L1 is more sparse/binary.

### Type I Error and Type II Error

Type I error: false positive

Type II error: false negative

### Fourier Transform

A Fourier transform converts a signal from time to frequency domain—it’s a very common way to extract features from audio signals or other time series such as sensor data.

### Likelihood

In statistics, the likelihood function (often simply called the likelihood) measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters.

### Generative vs Distinctive

A generative model will learn the distribution of data while a discriminative model will learn the distinction between different categories of data. The distinctive model can better at performance.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sisyphus.gitbook.io/project/machine-learning/frequent-questions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
