The Truth of Sisyphus
  • Introduction
  • Deep Learning
    • Basics
      • Hinge Loss
      • Regularizations
      • Linear Classification
      • Multi-Class and Cross Entropy Loss
      • Batch Norm and other Normalizations
      • Optimization
      • Optimization Functions
      • Convolution im2col
      • Activation Functions
      • Derivatives
        • Derivatives of Softmax
        • A Smooth (differentiable) Max Function
      • Model Ensemble
      • Layers Python Implementation
    • Classification
      • Mobile friendly networks
      • Non-local Neural Networks
      • Squeeze-and-Excitation Networks
      • Further Attention Utilization -- Efficience & Segmentation
      • Group Norm
      • ShuffleNet V2
    • Segmentation
      • Several Instance Segmentation
      • A Peek at Semantic Segmentation
      • Design Choices for Mobile Friendly Deep Learning Models, Semantic Segmentation
      • Efficient Video Object Segmentation via Network Modulation
      • BiSeNet
      • DeepLabV3+
    • Detection
      • CornerNet
      • IoU-Net
      • Why smooth L1 is popular in BBox Regression
      • MTCNN-NCNN
      • DetNet
      • SSD Illustration
    • RNN Related
      • GRU vs LSTM
      • BERT
    • Reinforcement Learning
      • AutoML in Practice Review
      • DRL for optimal execution of profolio transaction
    • Multi-task
      • Multi-task Overview
      • What are the tricks in Multi-Task network design?
    • Neural Network Interpretation
      • Neuron Visualization
    • Deep Learning Frameworks
      • How does Caffe work
      • [Gluon] When to use (Hybrid)Sequential and (Hybrid)Block
      • Gluon Hybrid Intro
      • Gluon HybridBlocks Walk-Through
      • A quick tour of Torch internals
      • NCHW / NHWC in Pytorch
      • Static & Dynamic Computation Graph
    • Converting Between DL Frameworks
      • Things To Be Considered When Doing Model Converting
      • Caffe to TensorFlow
    • Computation Graph Optimization
      • Two ways of TensorRT to optimize Neural Network Computation Graph
      • Customized Caffe Memory Optimization
      • NCNN Memory Optimization
      • Symbolic Programs Advantages: More Efficient, Reuse Intermediate Memory, Operation Folding
    • Deep Learning Debug
      • Problems caused by dead ReLU
      • Loss jumps to 87.3365
      • Common Causes of NANs During Training
    • Deployment
      • Efficient Convolution Operation
      • Quantization
    • What I read recently
      • Know Google the Paper Way
      • ECCV 2018
      • Neural Machine Translation
      • Street View OCR Extraction System
      • Teaching Machines to Draw
      • Pixel to Graph
      • Burst Image Deblurring
      • Material for Masses
      • Learning to Separate Object Sounds by Watching Unlabeled Video
    • Papers / Posts to be read
    • Dummy thoughts
  • Machine Learning
    • Classification
    • Regression
    • Clustering
    • Dimension Reduction
    • Metrics
    • Regularization
    • Bayesian Example
    • Machine Learning System Design
    • Recommendation
    • Essentials of Machine Learning
    • Linear Regression
    • Logistic Regression
      • Logistic Function
    • Gaussian Discriminant Analysis
    • Naive Bayes
    • SVM
    • MLE vs MAP
    • Boosting
    • Frequent Questions
    • Conclusion of Machine Learning
  • Python notes
    • Python _ or __ underscores usage
    • Python Multiprocess and Threading Differences
    • Heapq vs. Q.PriorityQueue
    • Python decorator
    • Understanding Python super()
    • @ property
    • Python __all__
    • Is Python List a Linked List or Array
    • What is the "u" in u'Hello world'
    • Python "self"
    • Python object and class
    • Python Class' Instance method, Class method, and Static Methods Demystified
    • Python WTF
    • Python find first value index in a list: [list].index(val)
    • Sort tuples, and lambda usecase
    • Reverse order of range()
    • Python check list is empty
    • Python get ASCII value from character
    • An A-Z of useful Python tricks
    • Python nested function variable scope
    • Python reverse a list
    • Python priority queue -- heapq
  • C++ Notes
    • Templates
    • std::string (C++) and char* (or c-string "string" for C)
    • C++ printf and cout
    • Class Member Function
    • Inline
    • Scope Resolution Operator ::
    • Constructor
    • Destructor
    • Garbage Collection is Critical
    • C++ Question Lists
  • Operating System
    • Basics
    • Mutex & Semaphore
    • Ticket Selling System
    • OS and Memory
    • Sort implementation in STL
    • Compile, link, loading & run
    • How to understand Multithreading and Multiprocessing from the view of Operating System
  • Linux & Productivity
    • Jupyter Notebook on Remote Server
    • Nividia-smi monitoring
  • Leetcode Notes
    • Array
      • 11. Container With Most Water
      • 35. Search Insert Position
    • Linked List
      • Difference between Linked List and Array
      • Linked List Insert
      • Design of Linked List
      • Two Pointers
        • 141. Linked List Cycle
        • 142. Linked List Cycle II
        • 160. Intersection of two Linked List
        • 19. Remove N-th node from the end of linked list
      • 206. Reverse Linked List
      • 203. Remove Linked List Elements
      • 328. Odd Even Linked List
      • 234. Palindrome Linked List
      • 21. Merge Two Sorted Lists
      • 430. Flatten a Multilevel Doubly Linked List
      • 430. Flatten a Multilevel Doubly Linked List
      • 708. Insert into a Cyclic Sorted List
      • 138. Copy List with Random Pointer
      • 61. Rotate List
    • Binary Tree
      • 144. Binary Tree Preorder Traversal
      • 94. Binary Tree Iterative In-order Traverse
    • Binary Search Tree
      • 98. Validate Binary Search Tree
      • 285. Inorder Successor in BST
      • 173. Binary Search Tree Iterator
      • 700. Search in a Binary Search Tree
      • 450. Delete Node in a BST
      • 701. Insert into a Binary Search Tree
      • Kth Largest Element in a Stream
      • Lowest Common Ancestor of a BST
      • Contain Duplicate III
      • Balanced BST
      • Convert Sorted Array to Binary Search Tree
    • Dynamic Programming
      • 198. House Robber
      • House Robber II
      • Unique Path
      • Unique Path II
      • Best time to buy and sell
      • Partition equal subset sum
      • Target Sum
      • Burst Ballons
    • DFS
      • Clone Graph
      • General Introduction
      • Array & String
      • Sliding Window
  • Quotes
    • Concert Violinist Joke
    • 船 Ship
    • What I cannot create, I do not understand
    • Set your course by the stars
    • To-do list
Powered by GitBook
On this page
  • Bias vs Variance
  • KNN
  • K-means
  • KNN vs K-means
  • Metrics
  • Bayes' Theorem
  • L1 L2 Regularization
  • Type I Error and Type II Error
  • Fourier Transform
  • Likelihood
  • Generative vs Distinctive
  1. Machine Learning

Frequent Questions

PreviousBoostingNextConclusion of Machine Learning

Last updated 4 years ago

Bias vs Variance

Simple model, high bias -> lower the bias -> complex model -> overfitting -> higher variance

Complex model, high variance -> lower the variance -> simplified model -> underfitting -> higher bias

High order polynomial function is the more complex model.

KNN

To determine a new input instance's class/regress value, find the K closest history data points and do a majority vote to predict.

  • Normalize the features to avoid over/under-estimation for large/small value features.

  • KD tree, grids on multi-dimension space, to accelerate the K neighbors' finding procedure.

  • Weighted neighbors, the closer the larger weights would be, to make it more sensitive to similarity.

A way to put unlabeled data into K groups where data points in the same group are similar and data points in different groups are far apart.

  • Choose K -> random initialize centers -> iteratively recalculate centers with each cluster's distribution.

  • Pros:

    • Easy to understand, guarantee to converge, scalable to large data, relative fast

  • Cons:

    • manually set K, initial sensitive, outliers sensitive, linear boundaries, O(n) for each step

  • To improve:

    • verify performance with different K, select K at the turning point

    • K-means++, sequentially select initial centers in a way that the new center is far away from the previous ones.

    • Pre-processing to normalize and filter outliers

    • Use kernel to map data points into high dimensions. Then apply linear boundaries there.

Both rely on measuring distances, Euclidean/Manhattan distance, power-> n, cosine similarity, etc.

  • KNN has to have labels for each history data sample ahead.

  • K-means is an unsupervised learning method and doesn't require labels at all.

Positive/Negative: the predicted results; True/False: whether the prediction is correct.

TP, TN: Correctly predicted positive and correctly predicted negative

FP, FN: Wrongly predicted positive and wrongly predicted negative

TP / (TP + FP) -> among all positive predictions, how much correct: precision

TP / (TP + FN) -> among all positive groud truths, how much correct: recall, True positive rate

FP / (FP + TN) -> among all negative ground truths, how much wrong: False Positive rate

ROC curve: TPR + FPR

P(theta | x) = P(x | theta) * P(theta) / P(x)

P(x, theta): P(x | theta) * P(theta), joint probability

P(x | theta): the probability of event B occurring given that A == likelihood of A given B.

P(theta): Priori probability

P(theta | x): Posterior probability

Naivety: the conditional probability is calculated as the pure product of the individual probabilities of components. This implies the absolute independence of features — a condition probably never met in real life.

L1 L2 Regularization

Only the loss term: empirical risk minimization

Loss + regularization: structural risk minimization

L2 regularization tends to spread the loss on all terms, L1 is more sparse/binary.

Type I Error and Type II Error

Type I error: false positive

Type II error: false negative

Fourier Transform

A Fourier transform converts a signal from time to frequency domain—it’s a very common way to extract features from audio signals or other time series such as sensor data.

Likelihood

In statistics, the likelihood function (often simply called the likelihood) measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters.

Generative vs Distinctive

A generative model will learn the distribution of data while a discriminative model will learn the distinction between different categories of data. The distinctive model can better at performance.

K-means

KNN vs K-means

Metrics

Bayes' Theorem

L1 means theta prior probability is Laplacian, L2 means theta prior probability is gaussian.

🧐
🤔
😤
🤠
https://www.bilibili.com/video/BV1aE411L7sj?p=6&spm_id_from=pageDriver
😆
😲
与当前最近距离半径为圆相交的边界需要左右均检查,不相交则不需要再检查 the other half