Blog – Dontloo's Blog

Backpropagation February 02, 2022 BackpropagationAt training time, a neural network can be seen as a funtion that takes a vector input and outputs a scalar loss, \(l=loss(f(\mathbf{x}))\). Backpropagation is the process of computing the derivatives by applying the chain rule,\[\frac{\partial l}{\partial \mathbf{x}}=\frac{\partial l}{\partial \mathbf{f}}\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\]where \(\mathbf{f}\) is the network predictions, which is usually a vector.
Multiclass Classification and Multiple Binary Classifiers January 04, 2019 Placeholder
Atomic Operations and Randomness December 12, 2018 Placeholder
All (I Know) about Markov Chains June 14, 2018 Markov ChainsBy Markov chains we refer to discrete-time homogeneous Markov chains in this blog.
Summary of Latent Dirichlet Allocation March 27, 2018 What does LDA do?The LDA model converts a bag-of-words document in to a (sparse) vector, where each dimension corresponds to a topic, and topics are learned to capture statistical relations of words. Here’s a nice illustration from Bayesian Methods for Machine Learningby National Research University Higher School of Economics.
Expectation Maximization Sketch March 06, 2018 IntroSay we have observed data \(X\), the latent variable \(Z\) and parameter \(\theta\), we want to maximize the log-likelihood \(\log p(X|\theta)\). Sometimes it’s not an easy task, probably because it doesn’t have a closed-form solution, the gradient is difficult to compute, or there’re complicated constraints that \(\theta\) must satisfy.
Central Limit Theorem and Law of Large Numbers October 20, 2017 This post is about some understandings of the Central Limit Theorem and (un)related stuff.First of all, a very good lecture Lecture 29: Law of Large Numbers and Central Limit Theorem | Statistics 110.
Cortana Skills Walkthrough August 23, 2017 At the time of writing, Cortana supports two ways of creating skills as shown here.
Softmax June 18, 2017 Normalization functionsThere are many ways of doing normalization, in this article we focus on the following type\[y=f(x) \text{ s.t. } y_i\geq 0, \sum y_i=1. \]Such normalization functions are widely used in the machine learning field, for example, to represent discrete probability distributions.
Model the Joint Likelihood? May 31, 2017 When building a classifier, we often optimize the conditional log-likelihood \(\log p_\theta(t|x)=\log Multi(t|y(x), n=1) = \sum t_k\log y_k \) w.r.t some parameter \(\theta\), where \(x\) is the input, \(t\) is the target and \(y\) is the output of the classifier (network) which is guaranteed to be nonnegative and \(\sum y_k=1\) via normalization (e.g. softmax).Mostly the normalization does something like \(y_k = \hat{y_k}/\sum \hat{y_j}\), following the multinomial model we can interpret each \(y_k\) as \(p_\theta(t_k|x)\), we could further interpret \(\hat{y_k}\) (nonnegative) as \(p_\theta(t_k,x)\), then it follows \(p_\theta(t_k|x) = \hat{y_k}/\sum \hat{y_j} = p_\theta(t_k,x)/p_\theta(x) \), which is just the Bayes rule.So we have the joint probability defined, it seems we can directly read off \(\sum \hat{y_j}\) as the marginal likelihood \(p_\theta(x)\). However the problem is, \(p_\theta(x)\) could be just any distribution (say very high for one input and very low for another and vice versa), as long as the corresponding \(p_\theta(t_k,x)\) is of the same scale, the loss won’t be so different. Indeed it can be a meaningful distribution if we introduce some assumptions to \(p_\theta(x)\).
Naive Bayes and Logistic Regression April 25, 2017 IntroductionNaive Bayes and logistic regression are two basic machine learning models that are compared frequently, especially as the generative/discriminative counterpart of one another. However at first sight it seems these two methods are rather different. In naive Bayes we just count the frequencies of features and labels while in linear regression we optimize the parameters with regard to some loss function. If we express theses two models as probabilistic graphical models, we’ll see exactly how they are related.
MLE, MAP and Bayesian Methods March 09, 2017 MLE and MAPOne most common situation is, we have a model that could produce the (unnormalized) probability \( p(x|\theta) \) for some observation \( x \). We are often interested in the most probable \( \theta \) given the data, i.e. \( \theta^* = \arg\max_\theta p(\theta|x) \).
Notes on Coding Neural Networks November 08, 2016 Encapsulated Neural Network LibrariesThere’re many great open source libraries for neural networks and deep learning. Some of them try to wrap every function they provide into an uniform interface or protocol (so-called define and run, e.g. caffe and tensorflow frontend), such well encapsulated libraries might be easy to use but difficult to change. As the rapid development of deep learning, it becomes a common need for people in the field to experiment new ideas beyond those encapsulations, often I found that the very interface or protocol I need is just the programming language itself.
More on Joint Bayesian Verification October 23, 2016 DerivationsThe followings are some detailed derivatation of the formulars in the paper Bayesian Face Revisited: A Joint Formulation.
Conditional Random Fields Summary October 16, 2016 The Big Picture
Recipe for 99%+ Accuracy Face Recognition October 13, 2016 IntroThe title is exaggerated, actually by “99%+ accuracy face recognition” I mean “99+% accuracy on the LFW dataset”. This recipe contains every big idea you need to know to reproduce the results, and it depends on public data sets only.
About Search Algorithms October 08, 2016 Tree Based and Hash Based SearchPerhaps their simplest applications of tree based and hash based search are tree maps and hash maps.
Miscellaneous October 07, 2016 probabilistic distributions over the whole real lineWhy many distributions over the real number line decrease towards both ends (e.g. Cauchy, Gaussian)? Why there can not be a uniform distribution over the over entire space? Because we have to make sure the PDF integrates to one.
How to Create a Blog Like This October 07, 2016 First create a repo on Github with the name username.github.io. Copy everything from https://github.com/mmistakes/so-simple-theme.git (or another Jekyll theme repo) to the repo. In the _config.yml file, set url: https://username.github.io. Celebrate.