Packages of Epidemiology

Last Updated: 2020-04-26

流行疫学（Epidemiology）の R Package の、使い方を学ぶつもりで、検索をすると、あまりに、たくさんのものがあることが判明した。ここに挙げるのはほんの一部である。過去の、データを集めたものもあるが、現時点での、Covid-19 の状況にあわせて、利用してみることにする。
これは、学習の記録で、疫学について、または、Covid-19 について、何らかの情報を提供することを目的としていない。データの取得などの、実例と、Package について、調べた備忘録的なものである。

十分理解できているわけではないが、下でも紹介する incidence Package は、使いやすく、興味深い。背景の理論もすこし勉強してみたい。

1. R Epidemic Consortium

R の Software などのリンクがある。このサイトがベストかどうかは不明である。

Description: The R Epidemics Consortium (RECON) is an international not-for-profit, non-governmental organisation gathering experts in data science, modelling methodology, public health, and software development to create the next generation of analytics tools for informing the response to disease outbreaks, health emergencies and humanitarian crises, using the R software and other free, open-source resources.
web: RECON

2. R Packages

流行疫学（Epidemiology）に関する、R の Package をいくつかリストする。これらは、RECON にリストされているものもあるが、されていないものもある。

2.1 epimdr

説明にあるように、本や、coursera（MOOCs）のコースに関連した、例などを含んでいる。現時点では、コースは受講していない。

Description: Functions, data sets and shiny apps for “Epidemics: Models and Data in R” by Ottar N. Bjornstad (ISBN 978-3-319-97487-3) https://www.springer.com/gp/book/9783319974866. The package contains functions to study the S(E)IR model, spatial and age-structured SIR models; time-series SIR and chain-binomial stochastic models; catalytic disease models; coupled map lattice models of spatial transmission and network models for social spread of infection. The package is also an advanced quantitative companion to the coursera Epidemics Massive Online Open Course https://www.coursera.org/learn/epidemics.
web: epimdr: Functions and Data for “Epidemics: Models and Data in R”
Manual: epimdr
Book: Epidemics - Presents cutting-edge approaches to understanding infectious disease dynamics in the face of changing demographics

2.2 earlyR

3.1 Tim Churches のブログで紹介されており、Basic Reproduction Number \(R_0\) を推定できそうなので、使ってみた。（earlyR 応用例）しかし、知識も理解も十分ではないので、いずれ、もう少し学んでから再挑戦。

Description: Implements a simple, likelihood-based estimation of the reproduction number (R0) using a branching process with a Poisson likelihood. This model requires knowledge of the serial interval distribution, and dates of symptom onsets. Infectiousness is determined by weighting R0 by the probability mass function of the serial interval on the corresponding day. It is a simplified version of the model introduced by Cori et al. (2013) doi:10.1093/aje/kwt133.
web: earlyR: Estimation of Transmissibility in the Early Stages of a Disease Outbreak
Manual: earlyR
Homepage: Welcome to the earlyR package!
Function: get_R: Estimate the Reproduction Number
- This function estimates the (most of the time, ’basic’) reproduction number (R) using i) the known distribution of the Serial Interval (delay between primary to secondary onset) and ii) incidence data.
- The estimation of R relies on all available incidence data. As such, all zero incidence after the first case should be included in x. When using inidence from the ’incidence’ package, make sure you use the argument last_date to indicate where the epicurve stops, otherwise the curve is stopped after the last case. Use as.data.frame to double-check that the epicurve includes the last ’zeros’.

2.3 EpiEstim

3.1 Tim Churches のブログで紹介されており、Basic Reproduction Number \(R_0\) を推定するなどできそうなので、使ってみたが、知識も理解も十分ではないので、いずれ、もう少し学んでから再挑戦。

Description: Tools to quantify transmissibility throughout an epidemic from the analysis of time series of incidence as described in Cori et al. (2013) doi:10.1093/aje/kwt133 and Wallinga and Teunis (2004) doi:10.1093/aje/kwh255.
web: EpiEstim: Estimate Time Varying Reproduction Numbers from Epidemic Curves
Manual: EpiEstim
vignette: Using EpiEstim

2.4 EpiILM

Description: Provides tools for simulating from discrete-time individual level models for infectious disease data analysis. This epidemic model class contains spatial and contact-network based models with two disease types: Susceptible-Infectious (SI) and Susceptible- Infectious-Removed (SIR).
web: EpiILM: Spatial and Network Based Individual Level Models for Epidemics
Manual: EpiILM

2.5 EpiModel

Description: EpiModel is an R package that provides tools for simulating and analyzing mathematical models of infectious disease dynamics. Supported epidemic model classes include deterministic compartmental models, stochastic individual contact models, and stochastic network models. Disease types include SI, SIR, and SIS epidemics with and without demography, with utilities available for expansion to construct and simulate epidemic models of arbitrary complexity. The network model class is based on the statistical framework of temporal exponential random graph models (ERGMs) implementated in the Statnet suite of software for R.
web: EpiModel: Mathematical Modeling of Infectious Disease Dynamics
Manual: EpiModel
vignette: EpiModel: Mathematical Modeling of Infectious Disease Dynamics
EpiModel Home: EpiModel

2.6 EpiCurv

Description: Creates simple or stacked epidemic curves for hourly, daily, weekly or monthly outcome data.
web: EpiCurve: Plot an Epidemic Curve
Manual: EpiCurv
vignette: Using EpiCurve

2.7 epiR

Description: Tools for the analysis of epidemiological data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, and computing confidence intervals around incidence risk and incidence rate estimates. Miscellaneous functions for use in meta-analysis, diagnostic test interpretation, and sample size calculations.
web: epiR: Tools for the Analysis of Epidemiological Data
Manual: epiR
vignette: Descriptive Epidemiology using epiR

2.7 Incidence

Description: Provides functions and classes to compute, handle and visualise incidence from dated events for a defined time interval. Dates can be provided in various standard formats. The class ‘incidence’ is used to store computed incidence and can be easily manipulated, subsetted, and plotted. In addition, log-linear models can be fitted to ‘incidence’ objects using ‘fit’. This package is part of the RECON (http://www.repidemicsconsortium.org/) toolkit for outbreak analysis.
web: incidence: Compute, Handle, Plot and Model Incidence of Dated Events
Manual: Incidence
vignette: Customize plots of incidence

3. Website

Covid-19 のこともあり、サイトは限りなくある。例として、R Package を利用するために、参考にしたサイトをリストする。

3.1 COVID-19 epidemiology with R

Code も含まれていたので、実験をしてみた。（earlyR 応用例）時間をとって、いずれまた理解を深めたい。

web: COVID-19 epidemiology with R by by Tim Churches
blog: Tim Churches Health Data Science Blog

3.2 Learning Machines

web: Learning Machines, A blog about data, science, and learning machines – like us

4. Examples of Analysis

library(tidyverse)
library(rvest)

4.1 Collection of Data

4.1.1 Incidence data collated by John Hopkins University

おそらく、Covid-19 関連では、世界で、最も利用されているデータ。

Data Source

Github:CSSEGISandData/COVID-19, 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE
Covid-19 Dashboard
Global Health Security Index

ブログ 3.1 COVID-19 epidemiology with R の Code を確認する。

時間も経過しており、コードを変更しないといけない部分もあった。

We follow the following page. However, the url of the website is changed and hence revised.

web: COVID-19 epidemiology with R

Several countries, i.e., Australia, Canada, China, Denmark, France, Netherlands, United Kingtdom, are divided into regions.

library(tidyverse)
library(lubridate)
library(gridExtra)
# library(kableExtra)
#######
jhu_url <- "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
covid19_global <- read_csv(jhu_url)
covid19_six <- covid19_global %>% 
  rename(Province = "Province/State", Country_Region = "Country/Region") %>% 
  pivot_longer(-c(Province, Country_Region, Lat, Long), 
               names_to = "Date", values_to = "Cumulative_Cases") %>%
  mutate(Reported_Date = as.Date(Date, "%m/%d/%y")) %>% 
  mutate(Daily_Confirmed_Cases = c(0, diff(Cumulative_Cases))) %>% 
  select(Province, Country_Region, Reported_Date, Cumulative_Cases, Daily_Confirmed_Cases) %>%
  filter(Country_Region %in% c("Japan", "Korea, South", "Germany", "Italy", "Spain", "US")) 
japan <- covid19_six %>% filter(Country_Region == "Japan")
jd <- japan %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col() + 
  ggtitle("Japan: Daily Confirmed Cases")
jc <- japan %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("Japan: Cumulative Cases")
grid.arrange(jd, jc, ncol=2)

korea <- covid19_six %>% filter(Country_Region == "Korea, South")
kd <- korea %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col() + 
  ggtitle("Korea: Daily Confirmed Cases")
kc <- korea %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("Korea: Cumulative Cases")
grid.arrange(kd, kc, ncol=2)

germany <- covid19_six %>% filter(Country_Region == "Germany")
gd <- germany %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col() + 
  ggtitle("Germany: Daily Confirmed Cases")
gc <- germany %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("Germany: Cumulative Cases")
grid.arrange(gd, gc, ncol=2)

italy <- covid19_six %>% filter(Country_Region == "Italy")
id <- italy %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col() + 
  ggtitle("Italy: Daily Confirmed Cases")
ic <- italy %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("Italy: Cumulative Cases")
grid.arrange(id, ic, ncol=2)

spain <- covid19_six %>% filter(Country_Region == "Spain")
sd <- spain %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col()+ 
  ggtitle("Spain: Daily Confirmed Cases")
sc <- spain %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("Spain: Cumulative Cases")
grid.arrange(sd, sc, ncol=2)

us <- covid19_six %>% filter(Country_Region == "US")
usd <- us %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col()+ 
  ggtitle("US: Daily Confirmed Cases")
usc <- us %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("US: Cumulative Cases")
grid.arrange(usd, usc, ncol=2)

4.1.2 Wikipedia

個別の発症者のリスト、Line List が含まれている。

ブログ 3.1 COVID-19 epidemiology with R の Code を確認する。

web scraping をして、それから、wrangling 方法が書かれている。時間も経過しており、最初から、コードも書き直しが必要。また、Data Wrangling（整理）の部分の Code は、すべては、書かれていない。

In the blog above, R script reading the timeline in the wikipedia using rvest package is explained but the code is not perfect and could not use the code to clean the data.

4.1.3 Our World in Data

非常に興味深い様々なデータとそのグラフが掲載されている。データも公開されている。

4.1.4 Japanese Cases

新型コロナウイルス感染症の現在の状況と厚生労働省の対応について（令和２年３月６日版）
Examples の中で紹介したコードと基本的に同じ。

library(tidyverse) 
library(rvest)
library(stringr)
library(lubridate)
library(scales)
url <- "https://www.mhlw.go.jp/stf/newpage_10022.html"
h <- read_html(url)
tab <- h %>% html_nodes("table")
tab_0 <- tab[[3]] %>% html_table 
dat_0 <- tab_0
colnames_0 <- dat_0[1,]
colnames(dat_0) <- colnames_0
dat_0 <- dat_0[2:nrow(dat_0),]
dat_0 <- dat_0[,3:6]
tab_1 <- tab[[4]] %>% html_table 
dat_1 <- tab_1
colnames_1 <- dat_1[1,]
colnames(dat_1) <- colnames_1
dat_1 <- dat_1[2:nrow(dat_1),]
dat_1 <- dat_1[,4:7]
dat <- bind_rows(dat_0, dat_1) %>% arrange(確定日)
dat$居住地 <- dat$居住地 %>% str_replace(pattern = "\r\n\t\t\t\t", replacement = "")
rownames(dat) <- 1:nrow(dat)
dat$確定日 <- as.Date(dat$確定日, "%m/%d")
# dat[dat$年代 %in% c("10代未満", "305"),]
dat$性別[dat$年代 == "10代未満"] <- "男"
dat$年代[dat$年代 == "10代未満"] <- "10歳未満"
dat$年代[dat$年代 == "305"] <- "30代"
data_jp <- dat %>% filter(居住地 != "中国（武漢市）") %>% 
  filter(居住地 != "中国（湖南省）") %>% filter(居住地 != "中国")
data_jp

4.1.5 ジャッグジャパン Jag-Japan

2020年2月17日に公開されたようで、日本でのダッシュボードによる情報提供の最初であると思われる。個人的には、3月に入ってから、この存在を知る。ひとり一人の Line List を含んでおり、日本のもので、まとまったものとしては、一番、情報量が多い。しかし、本来は、まずは、厚生労働省のサイトで、このようなデータを、公開すべきなのだろう。

jag_url <- "https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv"
jag <- read_csv(jag_url)

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   通し = col_double(),
##   市町村内症例番号 = col_double(),
##   人数 = col_double(),
##   累計 = col_double(),
##   前日比 = col_double(),
##   発症数 = col_double(),
##   死者合計 = col_double(),
##   退院数累計 = col_double(),
##   退院数 = col_double(),
##   PCR検査実施人数 = col_double(),
##   PCR検査前日比 = col_double(),
##   X = col_double(),
##   Y = col_double(),
##   Field4 = col_logical(),
##   Field5 = col_logical(),
##   Field6 = col_logical(),
##   Field7 = col_logical(),
##   Field8 = col_logical(),
##   Field9 = col_logical(),
##   Field10 = col_logical()
## )

## See spec(...) for full column specifications.

## Warning: 6 parsing failures.
##  row              col expected        actual                                                               file
## 4464 市町村内症例番号 a double 神戸59/西宮36 'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 4465 市町村内症例番号 a double 神戸60/西宮37 'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 4830 市町村内症例番号 a double 千葉無症状70  'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 4831 市町村内症例番号 a double 千葉無症状71  'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 5921 市町村内症例番号 a double 千葉無症状72  'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## .... ................ ........ ............. ..................................................................
## See problems(...) for more details.

str(jag)

## tibble [13,259 × 51] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ 通し              : num [1:13259] 1 2 3 4 5 6 7 8 9 10 ...
##  $ 厚労省NO          : chr [1:13259] "1" "2" "3" "4" ...
##  $ 無症状病原体保有者: chr [1:13259] NA NA NA NA ...
##  $ 国内              : chr [1:13259] "A-1" "A-2" "A-3" "A-4" ...
##  $ チャーター便      : chr [1:13259] NA NA NA NA ...
##  $ 年代              : chr [1:13259] "30" "40" "30" "40" ...
##  $ 性別              : chr [1:13259] "男性" "男性" "女性" "男性" ...
##  $ 確定日            : chr [1:13259] "1/15/2020" "1/24/2020" "1/25/2020" "1/26/2020" ...
##  $ 発症日            : chr [1:13259] "1/3/2020" "1/14/2020" "1/21/2020" "1/23/2020" ...
##  $ 受診都道府県      : chr [1:13259] "神奈川県" "東京都" "東京都" "愛知県" ...
##  $ 居住都道府県      : chr [1:13259] "神奈川県" "中華人民共和国" "中華人民共和国" "中華人民共和国" ...
##  $ 居住管内          : chr [1:13259] NA NA NA NA ...
##  $ 居住市区町村      : chr [1:13259] NA NA NA NA ...
##  $ キー              : chr [1:13259] "神奈川県" "中華人民共和国" "中華人民共和国" "中華人民共和国" ...
##  $ 発表              : chr [1:13259] "神奈川県" "東京都" "東京都" "愛知県" ...
##  $ 都道府県内症例番号: chr [1:13259] "1" "1" "2" "1" ...
##  $ 市町村内症例番号  : num [1:13259] NA NA NA NA NA NA NA NA NA NA ...
##  $ ステータス        : chr [1:13259] "退院" "退院" "退院" NA ...
##  $ 備考              : chr [1:13259] NA NA NA "国籍:中国" ...
##  $ ソース            : chr [1:13259] "https://www.mhlw.go.jp/stf/newpage_08906.html" "https://www.mhlw.go.jp/stf/newpage_09079.html" "https://www.mhlw.go.jp/stf/newpage_09099.html" "https://www.mhlw.go.jp/stf/newpage_09100.html" ...
##  $ ソース2           : chr [1:13259] "https://www.pref.kanagawa.jp/docs/ga4/bukanshi/occurrence.html" "https://www.metro.tokyo.lg.jp/tosei/hodohappyo/press/2020/01/24/20.html" "https://www.metro.tokyo.lg.jp/tosei/hodohappyo/press/2020/01/27/24.html" "https://www.pref.aichi.jp/uploaded/attachment/321138.pdf" ...
##  $ ソース3           : chr [1:13259] NA NA NA NA ...
##  $ 人数              : num [1:13259] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 累計              : num [1:13259] 1 2 3 4 NA NA 7 8 NA NA ...
##  $ 前日比            : num [1:13259] 1 1 1 1 NA NA 3 1 NA NA ...
##  $ 発症数            : num [1:13259] 0 2 2 3 NA NA 1 2 NA NA ...
##  $ 死者合計          : num [1:13259] NA NA NA NA NA NA NA NA NA NA ...
##  $ 退院数累計        : num [1:13259] 1 NA NA NA NA NA NA NA NA NA ...
##  $ 退院数            : num [1:13259] 1 NA NA NA NA NA NA NA NA NA ...
##  $ PCR検査実施人数   : num [1:13259] NA NA NA NA NA NA NA NA NA NA ...
##  $ PCR検査前日比     : num [1:13259] NA NA NA NA NA NA NA NA NA NA ...
##  $ 職業_正誤確認用   : chr [1:13259] NA NA NA NA ...
##  $ 勤務先_正誤確認用 : chr [1:13259] NA NA NA NA ...
##  $ Hospital Pref     : chr [1:13259] "Kanagawa" "Tokyo" "Tokyo" "Aichi" ...
##  $ Residential Pref  : chr [1:13259] "Kanagawa" "China(Mainland)" "China(Mainland)" "China(Mainland)" ...
##  $ Release           : chr [1:13259] "Kanagawa Prefecture" "Tokyo Metropolitan Government" "Tokyo Metropolitan Government" "Aichi Prefecture" ...
##  $ Gender            : chr [1:13259] "Male" "Male" "Female" "Male" ...
##  $ X                 : num [1:13259] 140 116 116 116 116 ...
##  $ Y                 : num [1:13259] 35.4 39.9 39.9 39.9 39.9 ...
##  $ 確定日YYYYMMDD    : chr [1:13259] "2020/1/15" "2020/1/24" "2020/1/25" "2020/1/26" ...
##  $ 受診都道府県コード: chr [1:13259] "14" "13" "13" "23" ...
##  $ 居住都道府県コード: chr [1:13259] "14" NA NA NA ...
##  $ 更新日時          : chr [1:13259] "4/26/2020 16:54" NA NA NA ...
##  $ Field2            : chr [1:13259] NA NA NA NA ...
##  $ Field4            : logi [1:13259] NA NA NA NA NA NA ...
##  $ Field5            : logi [1:13259] NA NA NA NA NA NA ...
##  $ Field6            : logi [1:13259] NA NA NA NA NA NA ...
##  $ Field7            : logi [1:13259] NA NA NA NA NA NA ...
##  $ Field8            : logi [1:13259] NA NA NA NA NA NA ...
##  $ Field9            : logi [1:13259] NA NA NA NA NA NA ...
##  $ Field10           : logi [1:13259] NA NA NA NA NA NA ...
##  - attr(*, "problems")= tibble [6 × 5] (S3: tbl_df/tbl/data.frame)
##   ..$ row     : int [1:6] 4464 4465 4830 4831 5921 6003
##   ..$ col     : chr [1:6] "市町村内症例番号" "市町村内症例番号" "市町村内症例番号" "市町村内症例番号" ...
##   ..$ expected: chr [1:6] "a double" "a double" "a double" "a double" ...
##   ..$ actual  : chr [1:6] "神戸59/西宮36" "神戸60/西宮37" "千葉無症状70" "千葉無症状71" ...
##   ..$ file    : chr [1:6] "'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'" "'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'" "'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'" "'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   通し = col_double(),
##   ..   厚労省NO = col_character(),
##   ..   無症状病原体保有者 = col_character(),
##   ..   国内 = col_character(),
##   ..   チャーター便 = col_character(),
##   ..   年代 = col_character(),
##   ..   性別 = col_character(),
##   ..   確定日 = col_character(),
##   ..   発症日 = col_character(),
##   ..   受診都道府県 = col_character(),
##   ..   居住都道府県 = col_character(),
##   ..   居住管内 = col_character(),
##   ..   居住市区町村 = col_character(),
##   ..   キー = col_character(),
##   ..   発表 = col_character(),
##   ..   都道府県内症例番号 = col_character(),
##   ..   市町村内症例番号 = col_double(),
##   ..   ステータス = col_character(),
##   ..   備考 = col_character(),
##   ..   ソース = col_character(),
##   ..   ソース2 = col_character(),
##   ..   ソース3 = col_character(),
##   ..   人数 = col_double(),
##   ..   累計 = col_double(),
##   ..   前日比 = col_double(),
##   ..   発症数 = col_double(),
##   ..   死者合計 = col_double(),
##   ..   退院数累計 = col_double(),
##   ..   退院数 = col_double(),
##   ..   PCR検査実施人数 = col_double(),
##   ..   PCR検査前日比 = col_double(),
##   ..   職業_正誤確認用 = col_character(),
##   ..   勤務先_正誤確認用 = col_character(),
##   ..   `Hospital Pref` = col_character(),
##   ..   `Residential Pref` = col_character(),
##   ..   Release = col_character(),
##   ..   Gender = col_character(),
##   ..   X = col_double(),
##   ..   Y = col_double(),
##   ..   確定日YYYYMMDD = col_character(),
##   ..   受診都道府県コード = col_character(),
##   ..   居住都道府県コード = col_character(),
##   ..   更新日時 = col_character(),
##   ..   Field2 = col_character(),
##   ..   Field4 = col_logical(),
##   ..   Field5 = col_logical(),
##   ..   Field6 = col_logical(),
##   ..   Field7 = col_logical(),
##   ..   Field8 = col_logical(),
##   ..   Field9 = col_logical(),
##   ..   Field10 = col_logical()
##   .. )

jag

4.1.6 Toyo Keizai 東洋経済データ

すっきりしていて、見やすい。情報が十分であるとは、思わないが、いくつもの、観点から情報をみることができ、コミュニケーションの面からも、すぐれている。

東洋経済 Online: 新型コロナウイルス国内感染の状況
Github: kaz-ogiwara/covid19, 新型コロナウイルス感染症（COVID-19）の国内における状況を厚生労働省の報道発表資料からビジュアルにまとめた。

library(jsonlite)
toyokeizai <- fromJSON('https://raw.githubusercontent.com/kaz-ogiwara/covid19/master/data/data.json')
str(toyokeizai)

## List of 5
##  $ updated         :List of 4
##   ..$ last       :List of 2
##   .. ..$ ja: chr "最終更新：2020年4月26日"
##   .. ..$ en: chr "Last updated: 26 April 2020"
##   ..$ transition :List of 2
##   .. ..$ ja: chr "4月26日 12:00時点"
##   .. ..$ en: chr "As of 26 April 12:00"
##   ..$ prefectures:List of 2
##   .. ..$ ja: chr "4月26日 12:00時点"
##   .. ..$ en: chr "As of 26 April 12:00"
##   ..$ demography :List of 2
##   .. ..$ ja: chr "4月25日 18:00時点"
##   .. ..$ en: chr "As of 25 April 18:00"
##  $ transition      :List of 6
##   ..$ carriers  : chr [1:70, 1:6] "2020" "2020" "2020" "2020" ...
##   ..$ cases     : int [1:70, 1:4] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##   ..$ discharged: chr [1:70, 1:5] "2020" "2020" "2020" "2020" ...
##   ..$ serious   : int [1:70, 1:4] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##   ..$ deaths    : chr [1:70, 1:5] "2020" "2020" "2020" "2020" ...
##   ..$ pcrtested : int [1:54, 1:4] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##  $ demography      : int [1:10, 1:3] 129 70 29 12 5 2 0 0 0 4 ...
##  $ prefectures-map :'data.frame':    47 obs. of  4 variables:
##   ..$ code : int [1:47] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ ja   : chr [1:47] "北海道" "青森県" "岩手県" "宮城県" ...
##   ..$ en   : chr [1:47] "Hokkaido" "Aomori" "Iwate" "Miyagi" ...
##   ..$ value: int [1:47] 601 22 0 85 16 66 68 158 52 140 ...
##  $ prefectures-data:List of 4
##   ..$ discharged: int [1:39, 1:50] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##   ..$ deaths    : int [1:39, 1:50] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##   ..$ carriers  : int [1:46, 1:50] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##   ..$ pcrtested : int [1:46, 1:50] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...

# head(toyokeizai$transition$cases)
cases <- as.data.frame(toyokeizai$transition$cases)
colnames(cases) <- c("Y", "M", "D", "Cumulative_Cases")
toyokeizai_cases <- cases %>% 
  mutate(Date = as.Date(paste(cases$Y, cases$M, cases$D, sep = "-"), "%Y-%m-%d")) %>%
  mutate(Daily_Cases = c(0, diff(Cumulative_Cases))) %>%
  select(Date, Cumulative_Cases, Daily_Cases)
toyokeizai_cases

4.2 Estimation of the Reproduction Number

一人の感染者が何人の感染者を生み出すかが、Reproduction Number である。最初は、それが、新たな感染者となるが、抗体が生じているひとが多くなると、新たな感染者とはならないため、Reproduction Number は減少する。最初のものを、\(R_0\) と書き、Basic Reproduction Number という。この数とともに、Generation Period として、感染させる可能性が何日維持されるかも重要である。Covid-19 では、感染しても、発症しない、または、軽症で治癒するひとがかなりの割合おり、そのひとたちも、感染源となり得るとされており（このひとたちの Reproduction Number は不明）、感染の広がりを見えなくしている。Basic Reproduction Number も、Generation Period も、実際には、確定が非常に困難である。以下では、検査で陽性だと判明するひとを捕捉することになるが、正確な値をえることは、原理的に不可能であるように思われる。
さらに、検査の実施状況、検査の正確さなども、影響し、なにが把握できて、なにが把握できないかを把握することも困難である。

The reproductive number of COVID-19 is higher compared to SARS coronavirus
- このレポートによると Esitimation はかなり幅があるが、3.0 程度と思われる。
R_t Covid19
- アメリカの各週 Reproduction Number を Real Time で掲載しているサイト
Preliminary estimating the reproduction number of the coronavirus disease (COVID-19) outbreak in Republic of Korea and Italy by 5 March 2020

4.2.1 earlyR

In the folowing we follow earlyR homepage. The earlyR package uses incidence package. See earlyR above.

Note: It is very important to make sure that the last days without cases are included here. Omitting this information would lead to an over-estimation of the reproduction number (R).

このようにあるので、現時点で EarlyR Package で Reproduction Number を、Estimate することは、できない。また、パラメターとして、使用する、mu と sigma も不明である。例として掲載されている例ともあまりにもかけ離れている。

3.1 の Article では、mean = 5.0 days, standard deviation = 3.4 としているので、これを採用することにする。適用する地域が異なるときに、これでよいかは不明。

Data: data_jp

Japanese data as of March 7, 2020 of Covid-19.

library(earlyR)
library(incidence)
onset <- data_jp$確定日
today <- as.Date("2020-03-07")
i <- incidence(onset, last_date = today)

## 3 missing observations were removed.

## <incidence object>
## [296 cases from days 2020-01-15 to 2020-03-07]
## 
## $counts: matrix with 53 rows and 1 columns
## $n: 296 cases in total
## $dates: 53 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 53 days
## $cumulative: FALSE

plot(i, border = "white")

mu <- 5 # mean in days days
sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res

## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 1.141141
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.1504831 0.123495 0.09741672 0.07471344...
## 
##  // $dates:
## [1] "2020-01-16" "2020-01-17" "2020-01-18" "2020-01-19" "2020-01-20"
## [6] "2020-01-21"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312

plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.951   1.101   1.141   1.142   1.191   1.361

quantile(R_val, c(0.025, 0.975)) # 95% credibility interval

##     2.5%    97.5% 
## 1.021021 1.271271

hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

Johns Hopkins University Data

First we define a function to find the onset.

date <- as_date(c("2020-01-15", "2020-02-10", "2020-02-12", "2020-02-13"))
class(date)

## [1] "Date"

n <- c(1, 1, 3, 5)

datelist <- function(date, n){
  dat <- date[1]
  for(i in 2:length(date)) dat <- c(dat, rep(date[i],n[i]))
  dat
}

datelist(date, n)

##  [1] "2020-01-15" "2020-02-10" "2020-02-12" "2020-02-12" "2020-02-12"
##  [6] "2020-02-13" "2020-02-13" "2020-02-13" "2020-02-13" "2020-02-13"

Japan

#library(earlyR)
#library(incidence)
onset <- datelist(japan$Reported_Date, japan$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-03-20")
i <- incidence(onset, last_date = today)

## 12268 observations outside of [2020-01-22, 2020-03-20] were removed.

## <incidence object>
## [962 cases from days 2020-01-22 to 2020-03-20]
## 
## $counts: matrix with 59 rows and 1 columns
## $n: 962 cases in total
## $dates: 59 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 59 days
## $cumulative: FALSE

plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res

## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 1.131131
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.1504831 0.123495 0.455961 0.4201881...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312

plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.021   1.111   1.131   1.132   1.161   1.281

hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

Korea

#library(earlyR)
#library(incidence)
onset <- datelist(korea$Reported_Date, korea$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-02-21")
i <- incidence(onset, last_date = today)

## 10524 observations outside of [2020-01-22, 2020-02-21] were removed.

## <incidence object>
## [204 cases from days 2020-01-22 to 2020-02-21]
## 
## $counts: matrix with 31 rows and 1 columns
## $n: 204 cases in total
## $dates: 31 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 31 days
## $cumulative: FALSE

plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res

## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 2.762763
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.3297552 0.2962323 0.4271719 0.5502179...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312

plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.012   2.623   2.753   2.758   2.883   3.443

hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

Germany

#library(earlyR)
#library(incidence)
onset <- datelist(germany$Reported_Date, germany$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-03-01")
i <- incidence(onset, last_date = today)

## 156383 observations outside of [2020-01-22, 2020-03-01] were removed.

## <incidence object>
## [131 cases from days 2020-01-22 to 2020-03-01]
## 
## $counts: matrix with 40 rows and 1 columns
## $n: 131 cases in total
## $dates: 40 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 40 days
## $cumulative: FALSE

plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res

## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 2.242242
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.1504831 0.123495 0.09741672 0.2539856...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312

plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.742   2.112   2.252   2.256   2.382   2.913

hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

Italy

#library(earlyR)
#library(incidence)
onset <- datelist(italy$Reported_Date, italy$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-02-25")
i <- incidence(onset, last_date = today)

## 195029 observations outside of [2020-01-22, 2020-02-25] were removed.

## <incidence object>
## [323 cases from days 2020-01-22 to 2020-02-25]
## 
## $counts: matrix with 35 rows and 1 columns
## $n: 323 cases in total
## $dates: 35 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 35 days
## $cumulative: FALSE

plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res

## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 2.442442
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.1504831 0.123495 0.09741672 0.07471344...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312

plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.062   2.342   2.442   2.445   2.533   2.973

hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

Spain

#library(earlyR)
#library(incidence)
onset <- datelist(spain$Reported_Date, spain$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-03-01")
i <- incidence(onset, last_date = today)

## 223675 observations outside of [2020-01-22, 2020-03-01] were removed.

## <incidence object>
## [85 cases from days 2020-01-22 to 2020-03-01]
## 
## $counts: matrix with 40 rows and 1 columns
## $n: 85 cases in total
## $dates: 40 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 40 days
## $cumulative: FALSE

plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res

## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 2.572573
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.1504831 0.123495 0.09741672 0.07471344...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312

plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.752   2.412   2.603   2.611   2.783   3.734

hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

#library(earlyR)
#library(incidence)
onset <- datelist(us$Reported_Date, us$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-03-10")
i <- incidence(onset, last_date = today)

## 937195 observations outside of [2020-01-22, 2020-03-10] were removed.

## <incidence object>
## [959 cases from days 2020-01-22 to 2020-03-10]
## 
## $counts: matrix with 49 rows and 1 columns
## $n: 959 cases in total
## $dates: 49 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 49 days
## $cumulative: FALSE

plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res

## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 2.072072
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.3297552 0.2962323 0.7857161 0.7164204...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312

plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.862   2.022   2.072   2.075   2.122   2.352

hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

4.3 EpiEstim

3.1 Tim Churches のブログで紹介されている。（earlyR 応用例）しかし、知識も理解も十分ではないので、いずれ、もう少し学んでから再挑戦。時間をみつけて、取り組む。

We follow EpiEstim: vignette. See EpiEstim above.

4.4 Incidence

incidence package を用いた例を下につける。

Epidemic curves made easy using the R package incidence の方法を適用たものである。

参考：Incidence: compute, handle, visualise, and model incidence of dated events

4.4.1 Analysis of Japan using jaga data

jag_url <- "https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv"
jag <- read_csv(jag_url)

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   通し = col_double(),
##   市町村内症例番号 = col_double(),
##   人数 = col_double(),
##   累計 = col_double(),
##   前日比 = col_double(),
##   発症数 = col_double(),
##   死者合計 = col_double(),
##   退院数累計 = col_double(),
##   退院数 = col_double(),
##   PCR検査実施人数 = col_double(),
##   PCR検査前日比 = col_double(),
##   X = col_double(),
##   Y = col_double(),
##   Field4 = col_logical(),
##   Field5 = col_logical(),
##   Field6 = col_logical(),
##   Field7 = col_logical(),
##   Field8 = col_logical(),
##   Field9 = col_logical(),
##   Field10 = col_logical()
## )

## See spec(...) for full column specifications.

## Warning: 6 parsing failures.
##  row              col expected        actual                                                               file
## 4464 市町村内症例番号 a double 神戸59/西宮36 'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 4465 市町村内症例番号 a double 神戸60/西宮37 'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 4830 市町村内症例番号 a double 千葉無症状70  'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 4831 市町村内症例番号 a double 千葉無症状71  'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 5921 市町村内症例番号 a double 千葉無症状72  'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## .... ................ ........ ............. ..................................................................
## See problems(...) for more details.

colnames(jag)

##  [1] "通し"               "厚労省NO"           "無症状病原体保有者"
##  [4] "国内"               "チャーター便"       "年代"              
##  [7] "性別"               "確定日"             "発症日"            
## [10] "受診都道府県"       "居住都道府県"       "居住管内"          
## [13] "居住市区町村"       "キー"               "発表"              
## [16] "都道府県内症例番号" "市町村内症例番号"   "ステータス"        
## [19] "備考"               "ソース"             "ソース2"           
## [22] "ソース3"            "人数"               "累計"              
## [25] "前日比"             "発症数"             "死者合計"          
## [28] "退院数累計"         "退院数"             "PCR検査実施人数"   
## [31] "PCR検査前日比"      "職業_正誤確認用"    "勤務先_正誤確認用" 
## [34] "Hospital Pref"      "Residential Pref"   "Release"           
## [37] "Gender"             "X"                  "Y"                 
## [40] "確定日YYYYMMDD"     "受診都道府県コード" "居住都道府県コード"
## [43] "更新日時"           "Field2"             "Field4"            
## [46] "Field5"             "Field6"             "Field7"            
## [49] "Field8"             "Field9"             "Field10"

jag <- jag[,6:11]
head(jag)

#as.Date(jag$発症日[1], "%m/%d/%Y")
jag_c <- jag %>% mutate(Date_Confirmed = as.Date(jag$確定日, "%m/%d/%Y")) %>%
  filter(jag$性別 %in% c("男性", "女性"), is.na(Date_Confirmed) == FALSE)
jag_o <- jag %>% mutate(Date_of_Onset = as.Date(jag$発症日, "%m/%d/%Y")) %>%
  filter(jag$性別 %in% c("男性", "女性"), is.na(Date_of_Onset) == FALSE)
dim(jag_o)

## [1] 7226    7

dim(jag_c)

## [1] 12767     7

ic.7.group <- incidence(jag_c$Date_Confirmed, interval = 7, groups = jag_c$性別)
plot(ic.7.group, border = "white") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black")) + 
  theme_gray(base_family = "HiraKakuPro-W3") + 
  labs(title = "男女別週毎の確定者数 (確定日・性別不明を除く)")

ic.7.group

## <incidence object>
## [12767 cases from days 2020-01-13 to 2020-04-20]
## [12767 cases from ISO weeks 2020-W03 to 2020-W17]
## [2 groups: 女性, 男性]
## 
## $counts: matrix with 15 rows and 2 columns
## $n: 12767 cases in total
## $dates: 15 dates marking the left-side of bins
## $interval: 7 days
## $timespan: 99 days
## $cumulative: FALSE

i.7.group <- incidence(jag_o$Date_of_Onset, interval = 7, groups = jag_o$性別)
i.7.group

## <incidence object>
## [7226 cases from days 2019-12-30 to 2020-04-20]
## [7226 cases from ISO weeks 2020-W01 to 2020-W17]
## [2 groups: 女性, 男性]
## 
## $counts: matrix with 17 rows and 2 columns
## $n: 7226 cases in total
## $dates: 17 dates marking the left-side of bins
## $interval: 7 days
## $timespan: 113 days
## $cumulative: FALSE

plot(i.7.group, border = "white") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black")) + 
  theme_gray(base_family = "HiraKakuPro-W3") + 
  labs(title = "男女別週毎の発症者数 (発症日・性別不明を除く)")

ic.1.group <- incidence(jag_c$Date_Confirmed, interval = 1, groups = jag_c$性別)
ic.1.group

## <incidence object>
## [12767 cases from days 2020-01-15 to 2020-04-25]
## [2 groups: 女性, 男性]
## 
## $counts: matrix with 102 rows and 2 columns
## $n: 12767 cases in total
## $dates: 102 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 102 days
## $cumulative: FALSE

plot(ic.1.group, border = "white") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black")) + 
  theme_gray(base_family = "HiraKakuPro-W3") + 
  labs(title = "男女別確定者数 (確定日・性別不明を除く)")

i.1.group <- incidence(jag_o$Date_of_Onset, interval = 1, groups = jag_o$性別)
i.1.group

## <incidence object>
## [7226 cases from days 2020-01-03 to 2020-04-24]
## [2 groups: 女性, 男性]
## 
## $counts: matrix with 113 rows and 2 columns
## $n: 7226 cases in total
## $dates: 113 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 113 days
## $cumulative: FALSE

plot(i.1.group, border = "white") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black")) + 
  theme_gray(base_family = "HiraKakuPro-W3") + 
  labs(title = "男女別発症者数 (発症日・性別不明を除く)")

日本は、収束に向かっているとは言えないので、まだ適用できないが、一応、fit_optim_split, add_incidence_fit を加えてみる。

#library('magrittr')
i.1 <- incidence(jag_o$Date_of_Onset, interval = 1)
fos <- fit_optim_split(i.1)
fos$split

## [1] "2020-04-14"

fos$fit

## <list of incidence_fit objects>
## 
## attr(x, 'locations'): list of vectors with the locations of each incidence_fit object
## 
## 'before'
## 'after'
## 
## $model: regression of log-incidence over time
## 
## $info: list containing the following items:
##   $r (daily growth rate):
##      before       after 
##  0.06492453 -0.39656604 
## 
##   $r.conf (confidence interval):
##             2.5 %      97.5 %
## before  0.0610089  0.06884015
## after  -0.5695500 -0.22358205
## 
##   $doubling (doubling time in days):
##  before 
## 10.6762 
## 
##   $doubling.conf (confidence interval):
##           2.5 %   97.5 %
## before 10.06894 11.36141
## 
##   $halving (halving time in days):
##    after 
## 1.747873 
## 
##   $halving.conf (confidence interval):
##          2.5 %   97.5 %
## after 1.217008 3.100192
## 
##   $pred: data.frame of incidence predictions (97 rows, 6 columns)

plot(i.1, border = "white") %>%
  add_incidence_fit(fos$fit) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black")) + 
  theme_gray(base_family = "HiraKakuPro-W3") + 
  labs(title = "二分した対数線形回帰モデル")

最初の90日で求めてみる。

early.fit <- fit(i.1[1:90])

## Warning in fit(i.1[1:90]): 17 dates with incidence of 0 ignored for fitting

early.fit

## <incidence_fit object>
## 
## $model: regression of log-incidence over time
## 
## $info: list containing the following items:
##   $r (daily growth rate):
## [1] 0.06727837
## 
##   $r.conf (confidence interval):
##           2.5 %     97.5 %
## [1,] 0.06231059 0.07224614
## 
##   $doubling (doubling time in days):
## [1] 10.30268
## 
##   $doubling.conf (confidence interval):
##         2.5 %   97.5 %
## [1,] 9.594245 11.12407
## 
##   $pred: data.frame of incidence predictions (73 rows, 5 columns)

p1 <- plot(early.fit) +  
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black")) + 
  theme_gray(base_family = "HiraKakuPro-W3") + 
  labs(title = "対数線形回帰モデル")

p2 <- plot(i.1[1:90], fit = early.fit, color = "red") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black")) + 
  theme_gray(base_family = "HiraKakuPro-W3") + 
  labs(title = "ヒストグラムと対数線形回帰モデル")

grid.arrange(p1, p2, ncol=2)

4.4.2 Analysis of Korea using Johns Hopkins University data

##### Korea
korea %>% filter(Reported_Date > "2020-02-01") %>% head()

head(korea)

korea <- korea[,c(3,5)] 
korea[1,2] <- 1
i.korea <- as.incidence(korea$Daily_Confirmed_Cases, korea$Reported_Date)
plot(i.korea)

best.fit model using fit_optim_split

best.fit <- fit_optim_split(i.korea)
best.fit

## $df
##         dates   mean.R2
## 1  2020-02-20 0.3767131
## 2  2020-02-21 0.4634238
## 3  2020-02-22 0.5343370
## 4  2020-02-23 0.5797723
## 5  2020-02-24 0.6220872
## 6  2020-02-25 0.6518315
## 7  2020-02-26 0.6866798
## 8  2020-02-27 0.7095560
## 9  2020-02-28 0.7248289
## 10 2020-02-29 0.7369699
## 11 2020-03-01 0.7457659
## 12 2020-03-02 0.7521842
## 13 2020-03-03 0.7573995
## 14 2020-03-04 0.7598399
## 15 2020-03-05 0.7595860
## 16 2020-03-06 0.7580728
## 17 2020-03-07 0.7548353
## 18 2020-03-08 0.7472739
## 19 2020-03-09 0.7349910
## 20 2020-03-10 0.7108738
## 21 2020-03-11 0.7518267
## 22 2020-03-12 0.7384363
## 23 2020-03-13 0.7337232
## 24 2020-03-14 0.7295472
## 25 2020-03-15 0.7230293
## 
## $split
## [1] "2020-03-04"
## 
## $fit
## <list of incidence_fit objects>
## 
## attr(x, 'locations'): list of vectors with the locations of each incidence_fit object
## 
## 'before'
## 'after'
## 
## $model: regression of log-incidence over time
## 
## $info: list containing the following items:
##   $r (daily growth rate):
##     before      after 
##  0.1877725 -0.0613719 
## 
##   $r.conf (confidence interval):
##              2.5 %      97.5 %
## before  0.14523859  0.23030642
## after  -0.07021853 -0.05252527
## 
##   $doubling (doubling time in days):
##  before 
## 3.69142 
## 
##   $doubling.conf (confidence interval):
##           2.5 %   97.5 %
## before 3.009674 4.772472
## 
##   $halving (halving time in days):
##    after 
## 11.29421 
## 
##   $halving.conf (confidence interval):
##          2.5 %   97.5 %
## after 9.871286 13.19645
## 
##   $pred: data.frame of incidence predictions (83 rows, 6 columns)
## 
## $plot

plot(i.korea, fit = best.fit$fit, color = "red") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black")) + 
  theme_gray(base_family = "HiraKakuPro-W3") + 
  labs(title = "Korea: 二分した対数線形回帰モデル")