University of Tsukuba | Grad. Scho. Syst. and Info. Eng. | Dept. Comp. Sci. | List of Courses
データ解析特論_E
Instructor(s)
Keisuke Kameyama, Hideitsu Hino, Sho Tsugawa
E-Mail keisuke, hinohide, s-tugawa (@cs)
URL Course page at Manaba (https://managa.tsukuba.ac.jp) will be used for distribution of materials
Office hours Please contact by mail.
Cource# 01CH738, 01CJ235
Area Common
Basic/Advanced
Course style Lecture + PC drills
Term FallAB
Period Thu 5, 6
Room# 3B405
Keywords Data analysis, Statistics
Prerequisites Probability and Statistics of undergraduate level.
Goal
Outline This course will include lectures and drills using the R language regarding various techniques of data analysis. It will cover the basic theory, standard techniques and advanced methods developed recently. The course aims to prepare the attendee for interpretation, analysis and prediction using various data encountered in research.
Course plan Weeks 1-4 (Kameyama)
Week 1 : Introduction
  • Review of basic probability theory: probability, stochastic event, stochastic variable, probability distribution, probability density function.
  • Introduction to R : installation, language, calculation, data structure, input/output, packages

  • Week 2 : Estimation
  • Estimation of density functions (Maximum likelihood estimation (MLE), Bayesian estimation, MLE of mixed distributions (EM algorithm), nonparametric estimation)
  • Interval estimation and confidence level

  • Week 3 : Principal Component Analysis (PCA)
  • Covariance (correlation) matrix and principal component, nonlinear (kernel-based) PCA

  • Week 4 : Correlation Analysis and Regression
  • Correlation coefficient
  • Single and multiple regresssion

  • Weeks 5-7 (Hino)
    Week 5:Data Handling
    Pre- and Post-processing, stratification, splitting, various visualizations
  • data scaling, normalization, outlier removal and treatment of "NA"
  • common procedure for data analysis such as stratified sampling, cross validation

  • Week 6:statistical test power analysis
    introduction to experimental design
  • introduction to statistical test
  • learn how many samples are required for performing reasonable test and inference

  • Week 7: sampling methods
    introduction to computer intensive statistics
  • the use of sampling methods
  • confidence interval of mean, bootstrap methods

  • Weeks 8-10 (Tsugawa)
    Week 8: Network Analysis
  • Data with network structure
  • Network visualization
  • Metrics used in network analysis

  • Week 9: Clustering
  • Clustering utilizing distances among data
  • Network clustering: Clustering utilizing relationships among data
  • Evaluation of clustering

  • Week 10: Data ranking and evaluation
  • Node ranking utilizing topological structure of networks
  • Evaluation techniques for data ranking used in information retrieval
  • Textbook
    References Rで学ぶデータサイエンスシリーズ(共立出版)
    Applied Predictive Modeling, Max Kuhn & Kjell Johnson,Springer,2013
    Evaluation Total score of the term papers assigned by the lecturers.
    TF / TA
    Misc. Every week, the first half will be devoted for lecture and the latter for drills using R. Please bring a notebook computer that can run a R-language environment.
    TOP