Tuesday, July 6, 2010

Ten Simple Rules (Research Field)

Mathematical Concepts

1 The Axiom of Choice
2 The Axiom of Determinacy
3 Bayesian Analysis
4 Braid Groups
5 Buildings
6 Calabi-Yau Manifolds
7 Cardinals
8 Categories
9 Compactness and Compactification
10 Computational Complexity Classes
11 Countable and Uncountable Sets
12 C*-Algebras
13 Curvature
14 Designs
15 Determinants
16 Differential Forms and Integration
17 Dimension
18 Distributions
19 Duality
20 Dynamical Systems and Chaos
21 Elliptic Curves
22 The Euclidean Algorithm and Continued Fractions
23 The Euler and Navier-Stokes Equations
24 Expanders
25 The Exponential and Logarithmic Functions
26 The Fast Fourier Transform
27 The Fourier Transform
28 Fuchsian Groups
29 Function Spaces
30 Galois Groups
31 The Gamma Function
32 Generating Functions
33 Genus
34 Graphs
35 Hamiltonians
36 The Heat Equation
37 Hilbert Spaces
38 Homology and Cohomology
39 Homotopy Groups
40 The Ideal Class Group
41 Irrational and Transcendental Numbers
42 The Ising Model
43 Jordan Normal Form
44 Knot Polynomials
45 K-Theory
46 The Leech Lattice
47 L-Functions
48 Lie Theory
49 Linear and Nonlinear Waves and Solitons
50 Linear Operators and Their Properties
51 Local and Global in Number Theory
52 The Mandelbrot Set
53 Manifolds
54 Matroids
55 Measures
56 Metric Spaces
57 Models of Set Theory
58 Modular Arithmetic
59 Modular Forms
60 Moduli Spaces
61 The Monster Group
62 Normed Spaces and Banach Spaces
63 Number Fields
64 Optimization and Lagrange Multipliers
65 Orbifolds
66 Ordinals
67 The Peano Axioms
68 Permutation Groups
69 Phase Transitions
70 pi
71 Probability Distributions
72 Projective Space
73 Quadratic Forms
74 Quantum Computation
75 Quantum Groups
76 Quaternions, Octonions, and Normed Division Algebras
77 Representations
78 Ricci Flow
79 Riemann Surfaces
80 The Riemann Zeta Function
81 Rings, Ideals, and Modules
82 Schemes
83 The Schrödinger Equation
84 The Simplex Algorithm
85 Special Functions
86 The Spectrum
87 Spherical Harmonics
88 Symplectic Manifolds
89 Tensor Products
90 Topological Spaces
91 Transforms
92 Trigonometric Functions
93 Universal Covers
94 Variational Methods
95 Varieties
96 Vector Bundles
97 Von Neumann Algebras
98 Wavelets
99 The Zermelo-Fraenkel Axioms
100 Mathematical Statistics

Sunday, July 4, 2010

Machine Learning: A Probabilistic Approach


- A new textbook by Kevin Murphy (in preparation,
to be published by MIT Press, c. December 2011)


- Introduction to Machine Learning
- Publications by Kevin Murphy
- UBC Machine Learning Reading Group (MLRG)
- Kevin Murphy @ UBC (great resource for students)
- Kevin Murphy / Teaching
- Graduate, CS 540 (Machine learning) Fall 2008
- Graduate, Stat 406 (Algorithms for Classification and Prediction)

- Causal models as conditional density models
- Product Partition Models for Modelling
Changing Dependency Structure in Time Series



1 Introduction
1.1 Introduction
1.2 Classification
1.2.1 Example
1.2.2 Probabilistic output
1.2.3 Decision trees
1.2.4 K-nearest neighbor classifier
1.2.5 Logistic regression
1.2.6 Linear separability
1.2.7 Transforming the input variables
1.2.8 Kernel functions
1.2.9 Multinomial logistic regression
1.2.10 No free lunch theorem
1.2.11 Real world applications
1.3 Regression
1.3.1 Linear regression
1.3.2 Maximum likelihood and least squares
1.3.3 Real world applications
1.4 Unsupervised learning
1.4.1 Discovering clusters
1.4.2 Discovering latent factors
1.4.3 Discovering graph structure
1.4.4 Discovering relevant variables
1.4.5 Data compression
1.4.6 Data imputation
1.4.7 Anomaly detection
1.5 Other kinds of machine learning
1.5.1 Relational learning
1.5.2 Semi-supervised learning
1.5.3 Reinforcement learning
1.6 Overfitting
1.6.1 Examples of overfitting
1.6.2 The benefits of more data
1.6.3 Regularization
1.7 A probabilistic approach: what and why?
1.7.1 Degrees of Bayesianism
1.7.2 Representing uncertainty
1.7.3 Predicting the future
1.7.4 The mall N, large D problem
1.7.5 Model selection
1.7.6 Discussion and further reading

==============
I Probability
==============

2 Probability I
2.1 Introduction
2.2 What is probability?
2.3 Discrete random variables
2.4 Continuous random variables
2.5 Summaries of distributions
2.6 Some useful continuous distributions

3 Probability II
3.1 Introduction
3.2 Joint probability distributions
3.3 Functions of random variable
3.4 The exponential family *
3.5 Exchangeability *

4 The multivariate Gaussian and friends
4.1 Introduction
4.2 Definition and basic properties
4.3 Mahalanobis distance
4.4 Eigenanalysis of covariance matrices
4.5 Probabilistic inference
4.6 Information form *
4.7 Linear Gaussian systems
4.8 The multivariate Student T distribution *
4.9 Wishart distribution

5 Generative models for classification
5.1 Introduction
5.2 Discriminant analysis
5.3 Robust discriminant analysis
5.4 Naive Bayes classifiers
5.5 Log-sum-exp trick
5.6 Generative vs discriminative classifiers

6 Markov models
6.1 Introduction
6.2 Transition matrix
6.3 Language modeling
6.4 Stationary distribution
6.5 Google's PageRank algorithm
6.6 Conditions under w/c a stationary distribution exists: finite-state case
6.7 Conditions under w/c a stationary distribution exists: general case *

7 Directed graphical models (Unfinished)
7.1 Introduction
7.2 Example: medical diagnosis
7.3 Plate notation
7.4 Inference
7.5 Using DGMs to make generative classifiers
7.6 Conditional independence properties of DAGs
7.7 Casual interpretation of DAGs *

8 Information theory
8.1 Introduction
8.2 Entropy
8.3 Mutual information
8.4 Relative entropy (KL divergence)
8.5 Jensen's inequality and its consequences
8.6 Maximum entropy principle
8.7 Data compression, probability and entropy
8.8 Minimum description length (MDL) principle

==============
II Statistics
==============

9 Maximum likelihood estimation
9.1 Introduction
9.2 MLE for linear regression
9.3 MLE for univariate Gaussian distribution
9.4 MLE for binomial and Bernoulli distributions
9.5 MLE for multinomial and multinoulli distributions
9.6 MLE for a naive Bayes classifier
9.7 MLE for Markov models
9.8 MLE for DGMs
9.9 MLE for multivariate Gaussian
9.10 MLE for discriminant analysis
9.11 Sufficient statistics
9.12 MLE for the exponential family *
9.13 Some problems with maximum likelihood estimation

10 Bayesian inference for single parameter models
10.1 Statistical inference
10.2 Bayesian model of human concept learning
(inference of a discrete parameter)
10.3 Tossing coins (inference for the binomial rate parameter)
10.4 Summarizing posterior distributions
10.5 Why not just use MAP estimations? *
10.6 Priors
10.7 The marginal likelihood (evidence)

11 Bayesian inference for multi parameter models
11.1 Bayesian fitting of discrete variable models
11.2 Bayesian fitting of univariate Gaussian
11.3 Bayesian fitting of multivariate Gaussian *
11.4 Bayesian fitting of discriminant analysis models
11.5 Bayesian fitting of a linear regression model
11.6 Bayesian fitting of the exponential family *

12 Decision theory
12.1 Introduction
12.2 Bayesian decision theory
12.3 Frequentist decision theory *
12.4 Empirical risk minimization *
12.5 The false positive vs false negative tradeoff

13 Bayesian model selection and checking
13.1 Introduction
13.2 Bayes factor
13.3 Testing equality of two Binomial rate parameters
13.4 Testing equality of two Gaussian means (Bayesian T-tests) *
13.5 Testing for differentially expressed genes *
13.6 Testing for Markovianity
13.7 BIC approximation to log marginal likelihood
13.8 Jeffreys-Lindley paradox *
13.9 Model checking
13.10 Posterior-predictive p-values

14 Empirical and Hierarchical Bayes
14.1 Empirical Bayes
14.2 Empirical Bayes for linear regression
14.3 Hierarchical Bayes *
14.4 Multiple binomial variables
14.5 Multiple Gaussian variables
14.6 Bayesian fitting of Markov language models
14.7 Fitting multiple related linear regressions

================
III Computation
================

15 Unconstrained optimization
15.1 Introduction
15.2 Kinds of optimization problems
15.3 Local vs global optima
15.4 Computational complexity of finding an optimum *
15.5 Overview of algorithms
15.6 Examples of optimization problems
15.7 First-order methods
15.8 Second-order methods
15.9 Comparison of algorithms
15.10 Mathlab software
15.11 Gradient-free unconstrained optimization

16 Expectation maximization (EM) algorithm
16.1 Introduction
16.2 Basic idea
16.3 Example: computing MLE for MVN from partially observed data
16.4 Example: MLE for the multivariate Student distribution *
16.5 EM theory *
16.6 Bound optimization / MM algorithms
16.7 EM variants *
16.8 Convergence rate of EM *

17 Deterministic algorithms for approximate inference (Unfinished)
17.1 Introduction
17.2 Laplace approximation
17.3 Variational Bayes *
17.4 Bits-back coding
17.5 Variational free energy
17.6 The mean field method
17.7 VB for univariate Gaussian
17.8 Forward or reverse KL?
17.9 Software for variational inference

18 Stochastic algorithms for approximate inference (Unfinished)
18.1 Introduction
18.2 Generating random samples from a univariate distribution
18.3 Importance sampling (Unfinished)
18.4 Markov Chain Monte Carlo (MCMC)
18.5 Gibbs sampling
18.6 Metropolis Hastings algorithm
18.7 Accuracy of MCMC
18.8 Convergence
18.9 More advanced MCMC algorithms

19 Constrained optimization
19.1 Introduction
19.2 Lagrange multipliers and the KKT conditions
19.3 Examples
19.4 Primal-dual problems *
19.5 Overview of algorithms
19.6 Conjugate duality

20 Discrete optimization
20.1 Introduction
20.2 Dynamic programming
20.3 Branch and bound
20.4 Hill climbing/coordinate ascent
20.5 Stochastic local search
20.6 Tabu search
20.7 Evolutionary algorithms
20.8 Matlab software

============================
IV More supervised learning
============================

21 Linear regression
21.1 Introduction
21.2 Maximum likelihood and least squared revisited
21.3 Robust regression *
21.4 Censored regression *
21.5 Ridge regression revisited
21.6 Bayesian linear regression (unknown o^2)
21.7 Empirical Bayes for linear regression
21.8 Variational Bayes for linear regression *
21.9 Kernel smoothers *
21.10 Semi-parametric regression *

22 Generalized linear models
22.1 Introduction
22.2 Multinomial logistic regression
22.3 Bayesian logistic regression
22.4 Empirical Bayes for logistic regression
22.5 Variational Bayes for logistic regression
22.6 Generalized linear models
22.7 Probit regression *

23 Neural networks
23.1 Introduction
23.2 Kinds of neural networks
23.3 Parameter estimation, or, how to train a neural net
23.4 Implementation *
23.5 Bayesian neural networks *

24 Variable selection
24.1 Introduction
24.2 Detecting marginally relevant variables
24.3 Bayesian variable selection
24.4 Subset selection
24.5 l1 regularization
24.6 Variants on l1 regularization *
24.7 Non-convex regularization
24.8 Automatic relevancy determination (ARD)

25 Sparse kernel machines
25.1 Introduction
25.2 Kernels
25.3 The kernel trick
25.4 Support vector machines
25.5 Experimental comparison of kernel methods
25.6 When should you use kernel methods?

26 Gaussian processes (Unfinished)
26.1 Introduction
26.2 GPs for regression
26.3 GPs for classification (Unfinished)

27 Trees, forests and boosting (Unfinished)
27.1 Introduction

=============================
V More unsupervised learning
=============================

28 Mixture models
28.1 Introduction
28.2 ML estimation
28.3 MAP estimation
28.4 Selecting the number of mixture components
28.5 VB for mixtures of Gaussians
28.6 Non-parametric density estimation
28.7 Using mixture models for classification

29 Latent factor models (Unfinished)
29.1 Introduction
29.2 Factor analysis
29.3 Mixture of factor analyzers
29.4 Principal components analysis (PCA)
29.5 EPCA
29.6 Variational EM for discrete PPCA
29.7 Variational Bayes for PPCA

30 Hidden Markov models
30.1 What are HMMs?
30.2 The main inferential tasks, and some applications
30.3 The forwards-backwards algorithm
30.4 The Viterbi algorithm
30.5 Forwards filtering, backwards sampling
30.6 Parameter estimation
30.7 HMM variants *

31 State space models
31.1 Linear dynamical systems
31.2 Kalman filtering
31.3 Kalman smoothing
31.4 Paramter estimation *
31.5 Inference in nonlinear, non-Gaussian systems *
31.6 Particle filtering

=========================
VI More graphical models
=========================

32 Undirected graphical models (Unfinished)
32.1 Introduction
32.2 Representing a joint distribution with a DGM
32.3 Inference
32.4 Learning (parameter estimation)
32.5 MRFs
32.6 CI properties of UGMs
32.7 Representing the joint distribution of a UGM
32.8 Conditional random fields
32.9 Inference
32.10 Learning (parameter estimation)

33 Inference in graphical models (Unfinished)
33.1 Introduction
33.2 Message passing (belief propagation)
33.3 Message passing (belief propagation) on trees
33.4 Variable elimination
33.5 Junction tree algorithm
33.6 Computational hardness of discrete inference
33.7 Inference in Gaussian models
33.8 Approximate inference
33.9 Mean field revisited
33.10 Loopy belief propagation (Unfinished)
33.11 Expectation propagation (Unfinished)

34 Network structure learning (Unfinished)
34.1 Structure learning for trees
34.2 Structure learning for DAGMs
34.3 Structure learning for Gaussian UGMs
34.4 Structure learning for discrete UGMs
34.5 Dependency networks

===========================
VII Supplementary material
===========================

35 Notation
35.1 General notation
35.2 Graph theory notation
35.3 Probability distributions
35.4 List of commonly used abbreviations
35.5 Big-O notation

36 Non-Bayesian inference (frequentist statistics)
36.1 Introduction
36.2 Sampling distribution of estimators
36.3 Desirable properties of estimators
36.4 Confidence intervals
36.5 Hypothesis testing
36.6 Pathologies of frequentist statistics
36.7 Why isn't everyone a Bayesian?

37 Linear algebra
37.1 Introduction
37.2 Basic notation
37.3 Matrix Multiplication
37.4 Operations of matrices
37.5 Properties of matrices
37.6 Special kinds of matrices
37.7 Cholesky decomposition
37.8 QR decomposition
37.9 Eigenvalue decomposition (EVD)
37.10 Singular value decomposition (SVD)

38 Calculus
38.1 Single variable calculus
38.2 Matrix calculus

Math review (some), needing a handbook


Mathematics Handbook for Science and Engineering

1 Fundamentals, Discrete Mathematics
1.1 Logic
1.2 Set Theory
1.3 Binary Relations and Functions
1.4 Algebraic Structures
1.5 Graph Theory
1.6 Codes

2 Algebra

3 Geometry and Trigonometry

4 Linear Algebra
4.1 Matrices
4.2 Determinants
4.3 Systems of Linear Equations
4.4 Linear Coordinate Transformation
4.5 Eigenvalues, Diagonalization
4.6 Quadratic Forms
4.7 Linear Spaces
4.8 Linear Mappings
4.9 Tensors
4.8 Complex matrices

5 The Elementary Functions

6 Differential Calculus (one variable)
6.1 Some Basic Concepts
6.2 Limits and Continuity
6.3 Derivatives
6.4 Monotonicity, Extremes of Functions

7 Integral Calculus
7.1 Indefinite Integrals
7.2 Definite Integrals
7.3 Applications of Differential and Integral Calculus
7.4 Tables of Indefinite Integrals
7.5 Tables of Definite Integrals

8 Sequences and Series
8.1 Sequences of Numbers
8.2 Sequences of Functions
8.3 Series of Constant Terms
8.4 Series of Functions
8.5 Taylor Series
8.6 Special Sums and Series

9 Ordinary Differential Equations (ODE)
9.1 Differential Equations of the First Order
9.2 Differential Equations of the Second Order
9.3 Linear Differential Equations
9.4 Autonomous systems
9.5 General Concepts and Results
9.6 Linear Differential Equations

10 Multidimensional Calculus
10.1 The Space R^n
10.2 Surfaces. Tangent Planes
10.3 Limits and Continuity
10.4 Partial Derivatives
10.5 Extremes of Functions
10.6 Functions f:R^n -> R^m (R^n -> R^m)
10.7 Double Integrals
10.8 Triple Integrals
10.9 Partial Differential Equations

11 Vector Analysis
11.1 Curves
11.2 Vector Fields
11.3 Line Integrals
11.4 Surface Integrals

12 Orthogonal Series and Special Functions

13 Transforms

14 Complex Analysis

15 Optimizations
15.1 Calculus of Variations
15.2 Linear Optimizations
15.3 Integer and Combinatorial Optimization
15.4 Nonlinear Optimization
15.5 Dynamic Optimization

16 Numerical Analysis

17 Probability Theory
17.1 Basic Probability Theory
17.2 Probability Distributions
17.3 Stochastic Processes
17.4 Algorithms for Calculation of Probability Distributions
17.5 Simulation
17.6 Queueing Systems
17.7 Reliability
17.8 Tables

18 Statistics
18.1 Descriptive Statistics
18.2 Point Estimation
18.3 Confidence Intervals
18.4 Table for Confidence Intervals
18.5 Tests of Significance
18.6 Linear Models
18.7 Distribution-free Methods
18.8 Statistical Quality Control
18.9 Factorial Experiments
18.10 Analysis of life time (failure time) data
18.11 Statistical glossary