Latest Posts

Backpropagation February 02, 2022 BackpropagationAt training time, a neural network can be seen as a funtion that takes a vector input and outputs a scalar loss, \(l=loss(f(\mathbf{x}))\). Backpropagation is the process of computing the derivatives by applying the chain rule,\[\frac{\partial l}{\partial \mathbf{x}}=\frac{\partial l}{\partial \mathbf{f}}\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\]where \(\mathbf{f}\) is the network predictions, which is usually a vector.
Argsort November 18, 2019 layout: posttitle: “argsort and ranks”modified: 2019-11-18categories: blogexcerpt:tags: []date: 2019-11-18—Placeholderhttps://stackoverflow.com/questions/40808772/tensorflow-indicator-matrix-for-top-n-values
Multiclass Classification and Multiple Binary Classifiers January 04, 2019 Placeholder
Atomic Operations and Randomness December 12, 2018 Placeholder
All (I Know) about Markov Chains June 14, 2018 Markov ChainsBy Markov chains we refer to discrete-time homogeneous Markov chains in this blog.
Summary of Latent Dirichlet Allocation March 27, 2018 What does LDA do?The LDA model converts a bag-of-words document in to a (sparse) vector, where each dimension corresponds to a topic, and topics are learned to capture statistical relations of words. Here’s a nice illustration from Bayesian Methods for Machine Learningby National Research University Higher School of Economics.
Expectation Maximization Sketch March 06, 2018 IntroSay we have observed data \(X\), the latent variable \(Z\) and parameter \(\theta\), we want to maximize the log-likelihood \(\log p(X|\theta)\). Sometimes it’s not an easy task, probably because it doesn’t have a closed-form solution, the gradient is difficult to compute, or there’re complicated constraints that \(\theta\) must satisfy.
Central Limit Theorem and Law of Large Numbers October 20, 2017 This post is about some understandings of the Central Limit Theorem and (un)related stuff.First of all, a very good lecture Lecture 29: Law of Large Numbers and Central Limit Theorem | Statistics 110.
Cortana Skills Walkthrough August 23, 2017 At the time of writing, Cortana supports two ways of creating skills as shown here.
Softmax June 18, 2017 Normalization functionsThere are many ways of doing normalization, in this article we focus on the following type\[y=f(x) \text{ s.t. } y_i\geq 0, \sum y_i=1. \]Such normalization functions are widely used in the machine learning field, for example, to represent discrete probability distributions.