University of Tsukuba | Grad. Scho. Syst. and Info. Eng. | Dept. Comp. Sci. | List of Courses
データ解析特論_E
Instructor(s)
Yukino Baba, Sho Tsugawa, Taizo Suzuki
E-Mail baba, s-tugawa, taizo (@cs)
URL Course page at Manaba (https://manaba.tsukuba.ac.jp) will be used for distribution of materials
Office hours Please contact by mail.
Cource# 01CH738
Area Common
Basic/Advanced
Course style Lecture + PC drills
Term FallAB
Period Thu 5, 6
Room# 3B405
Keywords Data analysis, Statistics
Prerequisites Probability and Statistics of undergraduate level.
relation degree program competence Knowledge Utilization Skills,Management Skills,Research Skills,Expert Knowledge
Goal
Outline This course will include lectures and drills using the R language regarding various techniques of data analysis. It will cover the basic theory, standard techniques and advanced methods developed recently. The course aims to prepare the attendee for interpretation, analysis and prediction using various data encountered in research.
Course plan Weeks 1-4 (Suzuki)
Week 1 : Introduction
  • Review of basic probability theory: probability, stochastic event, stochastic variable, probability distribution, probability density function.
  • Introduction to R : installation, language, calculation, data structure, input/output, packages

  • Week 2 : Estimation
  • Estimation of density functions (Maximum likelihood estimation (MLE), Bayesian estimation, MLE of mixed distributions (EM algorithm), nonparametric estimation)
  • Interval estimation and confidence level

  • Week 3 : Principal Component Analysis (PCA)
  • Covariance (correlation) matrix and principal component, nonlinear (kernel-based) PCA

  • Week 4 : Correlation Analysis and Regression
  • Correlation coefficient
  • Single and multiple regresssion

  • Weeks 5-7 (Baba)
    Week 5:Bayesian data analysis
  • Bayesian inference and MCMC
  • Probabilistic programming language: Stan

  • Week 6:Probability distributions
  • Discrete distributions
  • Continuous distributions

  • Week 7:Practical models
  • Hierarchical models
  • Models with discrete parameters

  • Weeks 8-10 (Tsugawa)
    Week 8: Network Analysis
  • Data with network structure
  • Network visualization
  • Metrics used in network analysis

  • Week 9: Clustering
  • Clustering utilizing distances among data
  • Network clustering: Clustering utilizing relationships among data
  • Evaluation of clustering

  • Week 10: Data ranking and evaluation
  • Node ranking utilizing topological structure of networks
  • Evaluation techniques for data ranking used in information retrieval
  • Textbook
    References Rで学ぶデータサイエンスシリーズ(共立出版)
    Applied Predictive Modeling, Max Kuhn & Kjell Johnson,Springer,2013
    StanとRでベイズ統計モデリング(共立出版)
    Doing Bayesian Data Analysis: A Tutorial Introduction with R, John Kruschke, Academic Press, 2014
    Evaluation Total score of the term papers assigned by the lecturers.
    TF / TA
    Misc. Every week, the first half will be devoted for lecture and the latter for drills using R. Please bring a notebook computer that can run a R-language environment.
    TOP