- Backpropagation BackpropagationAt training time, a neural network can be seen as a funtion that takes a vector input and outputs a scalar loss, \(l=loss(f(\mathbf{x}))\). Backpropagation is the process of computing the derivatives by applying the chain rule,\[\frac{\partial l}{\partial \mathbf{x}}=\frac{\partial l}{\partial \mathbf{f}}\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\]where \(\mathbf{f}\) is the network predictions, which is usually a vector.
- Argsort layout: posttitle: “argsort and ranks”modified: 2019-11-18categories: blogexcerpt:tags: []date: 2019-11-18—Placeholderhttps://stackoverflow.com/questions/40808772/tensorflow-indicator-matrix-for-top-n-values
- Multiclass Classification and Multiple Binary Classifiers Placeholder
- Atomic Operations and Randomness Placeholder
- All (I Know) about Markov Chains Markov ChainsBy Markov chains we refer to discrete-time homogeneous Markov chains in this blog.
- Summary of Latent Dirichlet Allocation What does LDA do?The LDA model converts a bag-of-words document in to a (sparse) vector, where each dimension corresponds to a topic, and topics are learned to capture statistical relations of words. Here’s a nice illustration from Bayesian Methods for Machine Learningby National Research University Higher School of Economics.
- Expectation Maximization Sketch IntroSay we have observed data \(X\), the latent variable \(Z\) and parameter \(\theta\), we want to maximize the log-likelihood \(\log p(X|\theta)\). Sometimes it’s not an easy task, probably because it doesn’t have a closed-form solution, the gradient is difficult to compute, or there’re complicated constraints that \(\theta\) must satisfy.
- Central Limit Theorem and Law of Large Numbers This post is about some understandings of the Central Limit Theorem and (un)related stuff.First of all, a very good lecture Lecture 29: Law of Large Numbers and Central Limit Theorem | Statistics 110.
- Cortana Skills Walkthrough At the time of writing, Cortana supports two ways of creating skills as shown here.
- Softmax Normalization functionsThere are many ways of doing normalization, in this article we focus on the following type\[y=f(x) \text{ s.t. } y_i\geq 0, \sum y_i=1. \]Such normalization functions are widely used in the machine learning field, for example, to represent discrete probability distributions.