这是根据(ShanghaiTech University)王浩老师的授课所作的整理。
需要的预备知识:数分、高代、统计、优化
machine learning:(Tom M. Mitchell) “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E”.
? What is experience: historical data
? How to learn: learning models and algorithms
? Performance measure: cost functions (error, penalty)
Machine learning, a branch of artificial intelligence, concerns the study and
construction of systems that can learn and predict from data
The core of machine learning deals with representation and generalization:
? Representation/Explanation of data instances and functions evaluated on these instances are part of all machine learning systems
? Generalization (prediction) is the property that the system will perform well on unseen data instances
Machine learning tasks are typically classified into three broad categories
监督学习? Supervised learning: The computer is presented with example inputs and their desired outputs, given by a ”teacher”, and the goal is to learn a general rule that maps inputs to outputs.
【semi-supervised learning】
? Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).
? Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal. Another example is learning to play a game by playing against an opponent.
Learning Tasks
? classification
? regression回归
? clustering聚类
? Density estimation
? Dimensionality reduction降维
Methods:Regression、Decision trees、 k?means algortihm、Support vector machine、 Apriori algorithm、EM algorithm、 PageRank、kNN、Naive Bayes、Neural networks ...
The difference between machine learning and data mining:The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
Machine learning also has intimate ties to optimization:
? The three pillars: statistical modeling, feature selection, learning via optimization (netflix prize)
? Many learning problems are formulated as minimization of some loss on a training set of examples
Optimization algorithms/techniques
? Sparse optimization
? Iteratively reweighted least squares algorithm (IRLS)
? Gradient Descent Methods
? Online Gradient Methods
? Stochastic Gradient Methods
? Newton method
? Quasi-newton method (BFGS)
? Limited memory BFGS
? Coordinate Descent
? Alternating Direction methods of multipliers
? Penalty method, Augmented Lagrangian
? Gradient Projection method
? Iterative-thresholding method (IST)
? Active set method
? recursive least squares
? Line search, Convergence rate, Duality, KKT/Optimality conditions
参考书目:
1 For machine learning methods: “Machine Learning, A probabilistic Perspective”, Kevin P. Murphy, the MIT Press.
2 For optimization knowledge: “Numerical Optimization”, Stephen Wright, Jorge Nocedal, 2nd Edition, Springer.
3 For optimization techniques in machine learning: “Optimization for Machine Learning”, Suvrit Sra, Sebastian Nowozin, and Stephen J. Wright, the MIT Press.
4 Some lectures will be based on these books, but not all of them. Reading the textbooks is not required, but it is recommended. You are not responsible for textbook material that is not covered in lecture.