OECD (Organisation for Economic Co-operation and Development) のデータのアクセスについて
OECD Data からいくつかの方法で、データにアクセスできます。
膨大なデータベースにアクセスし、その簡単な表やグラフや情報を表示し、Download(表やグラフの取得)ができます。
R
Package OECD
のvignettes の最後に、 Alternative data-acquisition strategy として、出力を Export menu から、Excel, csv, PC-axis, API, SDMX (XML) とある、5番目の選択肢を利用する方法を説明し、例を挙げています。以下のデータ入手方法の説明がついています。library(OECD)
df <- get_dataset("REGION_INNOVATION",
"1+2.BEL+BE1+BE2+BE3+BEZZ.PCT_BIOTECH.ALL+AVG.2013+2014+2015")
head(df)
R
Package OECD
R
の PackageOECD
でデータを入手することも可能です。vignettes は、R Studio の Help に、OECD を入力して見ることもできます。わかりやすい説明とともに、unemploument に関するがあります。
get_datasets
で データベースのリストを入手し、search_dataset
で必要な、データを探し、get_data_structure
で当該データの構造を調べ、必要に応じて、その一部を選択してget_dataset
の Filter 機能も用いて、データを入手します。必要に応じて、browse_metadata
によりデータの変数や定義など内容を調べることができます。
browse_metadata
: Browse the metadata related to a series.
browse_metadata("DUR_D")
: Web Browser が立ち上がります。get_dataset
: Download OECD data sets.
df <- get_dataset("EPL_OV")
: Data Frame(データフレーム)を入手します。get_datasets
: Get a data frame with information on all available datasets.
dataset_list <- get_datasets()
: Data List を、Data Frame(データフレーム)形式で入手します。get_data_structure
: Get the data structure of a dataset.
get_data_structure("DUR_D")
: データ構造を、List 形式で入手します。search_dataset
: Search codes and descriptions of available OECD series
search_dataset("employment", dataset_list)
: get_datasets()
で入手した、Data Frame から、Key Word で、関連するデータを抽出します。install.packages("OECD")
library(OECD)
dataset_list <- get_datasets()
dataset_list
vignettes には、unemployment(失業統計)分野をさがし、リストの 93 番目の、DUR_I
, the duration of unemployment の分析があります。
朝日新聞 2020年03月21日 朝刊の「いちからわかる! 日本の途上国援助は世界4位なんだね 」に、主要国のODA実績額(2018年から、利率や返済期間、援助内容も考慮した新たな実質的な計算方法に変更)の推移のグラフがあり、支出先トップ3として、2018年のデータとして、インド、バングラデシュ、ベトナムとあった。グラフは最大の援助国、米国、ドイツ、英国、日本の中で、日本がトップだった、2000年から始めている。ソースは OECD。もう少し前からの、推移など、見てみることにした。「1位だったけど下がっていて、政策も変わってきたよ」とあり、内容も興味深い。
Development Aid 開発援助は、OECD の一つの柱でもあり、Topic から Development を選択すると、UN の Sustainable Development Goals (SDGs)も、含めて、関連の情報を、見ることができます。
いくつかのデータ取得方法を紹介します。
dp_live <- read.table("data/DP_LIVE_28032020124154065.csv", header=TRUE, sep=",")
head(dp_live)
Least Developing Countries: 2000-2018 最貧国援助
Unspecified: 2000-2018 すべて
Least Developing Countries: 1980-2018 最貧国援助
Unspecified: 1980年からの場合 すべて
rsdmx
package でデータを入手することができます。library(rsdmx)
dac2a_url <- "https://stats.oecd.org/restsdmx/sdmx.ashx/GetData/TABLE2A/10200.301+4+5+6+701+12+302.1.201.A+D/all?startTime=2000&endTime=2018"
dac2a_dataset <- readSDMX(dac2a_url)
dac2a_df <- as.data.frame(dac2a_dataset)
head(dac2a_df)
OECD
Packagelibrary(OECD)
dataset_list <- get_datasets()
dataset_list
oda <- search_dataset("ODA", data = dataset_list)
どの Country(国)などが、どの Country を援助しているかなどがリストされます。 最初の、41 TABLE2A, Aid (ODA) disbursements to countries and regions [DAC2a] は、どの国や地域を援助しているか、二番目の、121 TABLE5, Aid (ODA) by sector and donor [DAC5] は、援助をしている国、団体です。
すでに、データを見てきましたから、どれか見当がつくかたもおられるかもしれませんが、わからないときは、get_data_structure
で、データの構造をみます。特に、最初に、必ず、VAR_DESC がありますから、そのリストをとってみます。
library(tidyverse)
## ─ Attaching packages ───────────────────────────── tidyverse 1.3.0 ─
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ─ Conflicts ─────────────────────────────── tidyverse_conflicts() ─
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
var_desc <- function(x){get_data_structure(oda$id[x])$VAR_DESC}
oda_var_desc <- 1:nrow(oda) %>% map(var_desc)
oda_var_desc
## [[1]]
## id description
## 1 RECIPIENT Recipient
## 2 DONOR Donor
## 3 PART Part
## 4 AIDTYPE Aid type
## 5 DATATYPE Amount type
## 6 TIME Year
## 7 OBS_VALUE Observation Value
## 8 TIME_FORMAT Time Format
## 9 OBS_STATUS Observation Status
## 10 UNIT Unit
## 11 POWERCODE Unit multiplier
## 12 REFERENCEPERIOD Reference period
##
## [[2]]
## id description
## 1 DONOR Donor
## 2 SECTOR Sector
## 3 AIDTYPE Aid type
## 4 AMOUNT Amount type
## 5 TIME Year
## 6 OBS_VALUE Observation Value
## 7 TIME_FORMAT Time Format
## 8 OBS_STATUS Observation Status
## 9 UNIT Unit
## 10 POWERCODE Unit multiplier
## 11 REFERENCEPERIOD Reference period
##
## [[3]]
## id description
## 1 DAC_DONOR Donor
## 2 PART Part
## 3 TRANSACTYPE Aid type
## 4 FLOWS Fund flows
## 5 DATATYPE Amount type
## 6 TIME Year
## 7 OBS_VALUE Observation Value
## 8 TIME_FORMAT Time Format
## 9 OBS_STATUS Observation Status
## 10 UNIT Unit
## 11 POWERCODE Unit multiplier
## 12 REFERENCEPERIOD Reference period
##
## [[4]]
## id description
## 1 RECIPIENT Recipient
## 2 DONOR Donor
## 3 PART Part
## 4 AIDTYPE Aid type
## 5 DATATYPE Amount type
## 6 TIME Year
## 7 OBS_VALUE Observation Value
## 8 TIME_FORMAT Time Format
## 9 OBS_STATUS Observation Status
## 10 UNIT Unit
## 11 POWERCODE Unit multiplier
## 12 REFERENCEPERIOD Reference period
##
## [[5]]
## id description
## 1 DAC_DONOR Donor
## 2 TYINGSTATUS Tying status
## 3 AIDTYPE Aid type
## 4 TIME Year
## 5 OBS_VALUE Observation Value
## 6 TIME_FORMAT Time Format
## 7 OBS_STATUS Observation Status
## 8 UNIT Unit
## 9 POWERCODE Unit multiplier
## 10 REFERENCEPERIOD Reference period
##
## [[6]]
## id description
## 1 RECIPIENT Recipient
## 2 DONOR Donor
## 3 AIDTYPE Aid type
## 4 PART Part
## 5 AMOUNTTYPE Amount type
## 6 TIME Year
## 7 OBS_VALUE Observation Value
## 8 TIME_FORMAT Time Format
## 9 OBS_STATUS Observation Status
## 10 UNIT Unit
## 11 POWERCODE Unit multiplier
## 12 REFERENCEPERIOD Reference period
##
## [[7]]
## id description
## 1 INDICATOR Indicator
## 2 MEASURE Measure
## 3 LOCATION Country
## 4 OBS_VALUE Observation Value
## 5 TIME_FORMAT Time Format
## 6 OBS_STATUS Observation Status
##
## [[8]]
## id description
## 1 INDICATOR Indicator
## 2 TIME Time
## 3 OBS_VALUE Observation Value
## 4 TIME_FORMAT Time Format
## 5 OBS_STATUS Observation Status
##
## [[9]]
## id description
## 1 INDICATOR Indicator
## 2 TIME Time
## 3 OBS_VALUE Observation Value
## 4 TIME_FORMAT Time Format
## 5 OBS_STATUS Observation Status
##
## [[10]]
## id description
## 1 DONOR Donor
## 2 RECIPIENT Recipient
## 3 SECTOR Sector
## 4 TIME Year
## 5 OBS_VALUE Observation Value
## 6 TIME_FORMAT Time Format
## 7 OBS_STATUS Observation Status
## 8 UNIT Unit
## 9 POWERCODE Unit multiplier
## 10 REFERENCEPERIOD Reference period
##
## [[11]]
## id description
## 1 INDICATOR Indicator
## 2 TIME Time
## 3 OBS_VALUE Observation Value
## 4 TIME_FORMAT Time Format
## 5 OBS_STATUS Observation Status
##
## [[12]]
## id description
## 1 INDICATOR Indicator
## 2 TIME Time
## 3 OBS_VALUE Observation Value
## 4 TIME_FORMAT Time Format
## 5 OBS_STATUS Observation Status
##
## [[13]]
## id description
## 1 RECIPIENT Recipient
## 2 DONOR Donor
## 3 PART Part
## 4 TIME Year
## 5 OBS_VALUE Observation Value
## 6 TIME_FORMAT Time Format
## 7 OBS_STATUS Observation Status
## 8 UNIT Unit
## 9 POWERCODE Unit multiplier
## 10 REFERENCEPERIOD Reference period
TABLE2A
を入手します。df <- get_dataset(dataset = table2a)
は少し時間がかかります。table2a_structure$DONOR
を使って、国 Code と国名をマッチさせます。table2a <- "TABLE2A"
table2a_structure <- get_data_structure(table2a)
str(table2a_structure, max.level = 1)
## List of 12
## $ VAR_DESC :'data.frame': 12 obs. of 2 variables:
## $ RECIPIENT :'data.frame': 283 obs. of 2 variables:
## $ DONOR :'data.frame': 147 obs. of 2 variables:
## $ PART :'data.frame': 2 obs. of 2 variables:
## $ AIDTYPE :'data.frame': 24 obs. of 2 variables:
## $ DATATYPE :'data.frame': 2 obs. of 2 variables:
## $ TIME :'data.frame': 59 obs. of 2 variables:
## $ OBS_STATUS :'data.frame': 15 obs. of 2 variables:
## $ UNIT :'data.frame': 316 obs. of 2 variables:
## $ POWERCODE :'data.frame': 32 obs. of 2 variables:
## $ REFERENCEPERIOD:'data.frame': 96 obs. of 2 variables:
## $ TIME_FORMAT :'data.frame': 5 obs. of 2 variables:
str(table2a_structure)
## List of 12
## $ VAR_DESC :'data.frame': 12 obs. of 2 variables:
## ..$ id : chr [1:12] "RECIPIENT" "DONOR" "PART" "AIDTYPE" ...
## ..$ description: chr [1:12] "Recipient" "Donor" "Part" "Aid type" ...
## $ RECIPIENT :'data.frame': 283 obs. of 2 variables:
## ..$ id : chr [1:283] "10200" "10100" "10010" "71" ...
## ..$ label: chr [1:283] "All Recipients, Total" "Developing Countries, Total" "Europe, Total" "Albania" ...
## $ DONOR :'data.frame': 147 obs. of 2 variables:
## ..$ id : chr [1:147] "20005" "20001" "801" "1" ...
## ..$ label: chr [1:147] "Official Donors, Total" "DAC Countries, Total" "Australia" "Austria" ...
## $ PART :'data.frame': 2 obs. of 2 variables:
## ..$ id : chr [1:2] "1" "2"
## ..$ label: chr [1:2] "1 : Part I - Developing Countries" "2 : Part II - Countries in Transition"
## $ AIDTYPE :'data.frame': 24 obs. of 2 variables:
## ..$ id : chr [1:24] "201" "212" "221" "208" ...
## ..$ label: chr [1:24] "Grants, Total" "Grants: Debt Forgiveness" "Grants: Other Debt Grants" "AF/Interest Subsidies" ...
## $ DATATYPE :'data.frame': 2 obs. of 2 variables:
## ..$ id : chr [1:2] "A" "D"
## ..$ label: chr [1:2] "Current Prices" "Constant Prices"
## $ TIME :'data.frame': 59 obs. of 2 variables:
## ..$ id : chr [1:59] "1960" "1961" "1962" "1963" ...
## ..$ label: chr [1:59] "1960" "1961" "1962" "1963" ...
## $ OBS_STATUS :'data.frame': 15 obs. of 2 variables:
## ..$ id : chr [1:15] "B" "C" "D" "E" ...
## ..$ label: chr [1:15] "Break" "Non-publishable and confidential value" "Difference in methodology" "Estimated value" ...
## $ UNIT :'data.frame': 316 obs. of 2 variables:
## ..$ id : chr [1:316] "1" "GRWH" "AVGRW" "IDX" ...
## ..$ label: chr [1:316] "RATIOS" "Growth rate" "Average growth rate" "Index" ...
## $ POWERCODE :'data.frame': 32 obs. of 2 variables:
## ..$ id : chr [1:32] "0" "1" "2" "3" ...
## ..$ label: chr [1:32] "Units" "Tens" "Hundreds" "Thousands" ...
## $ REFERENCEPERIOD:'data.frame': 96 obs. of 2 variables:
## ..$ id : chr [1:96] "2013_100" "2012_100" "2011_100" "2010_100" ...
## ..$ label: chr [1:96] "2013=100" "2012=100" "2011=100" "2010=100" ...
## $ TIME_FORMAT :'data.frame': 5 obs. of 2 variables:
## ..$ id : chr [1:5] "P1Y" "P1M" "P3M" "P6M" ...
## ..$ label: chr [1:5] "Annual" "Monthly" "Quarterly" "Half-yearly" ...
table2a_structure$VAR_DESC
table2a_structure$RECIPIENT
table2a_structure$DONOR
table2a_structure$AIDTYPE
# Set G7 Countries
country <- c("Canada", "Germany", "France", "United Kingdom", "Italy", "Japan", "United States")
# The following is possible.
# country <- c("301","4","5","6","701","12","302")
df <- get_dataset(dataset = table2a)
head(df)
str(df)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1000000 obs. of 11 variables:
## $ RECIPIENT : chr "10100" "10100" "10100" "10100" ...
## $ DONOR : chr "20005" "20005" "20005" "20005" ...
## $ PART : chr "1" "1" "1" "1" ...
## $ AIDTYPE : chr "201" "201" "201" "201" ...
## $ DATATYPE : chr "A" "A" "A" "A" ...
## $ TIME_FORMAT : chr "P1Y" "P1Y" "P1Y" "P1Y" ...
## $ UNIT : chr "USD" "USD" "USD" "USD" ...
## $ POWERCODE : chr "6" "6" "6" "6" ...
## $ REFERENCEPERIOD: chr NA NA NA NA ...
## $ obsTime : chr "1960" "1961" "1962" "1963" ...
## $ obsValue : num 3813 4133 4240 4211 4065 ...
donor_list <- table2a_structure$DONOR
names(donor_list) <- c("DONOR", "Country")
dff <- left_join(df, donor_list)
## Joining, by = "DONOR"
head(dff)
df <- get_dataset(dataset = table2a)
head(df)
str(table2a_structure)
## List of 12
## $ VAR_DESC :'data.frame': 12 obs. of 2 variables:
## ..$ id : chr [1:12] "RECIPIENT" "DONOR" "PART" "AIDTYPE" ...
## ..$ description: chr [1:12] "Recipient" "Donor" "Part" "Aid type" ...
## $ RECIPIENT :'data.frame': 283 obs. of 2 variables:
## ..$ id : chr [1:283] "10200" "10100" "10010" "71" ...
## ..$ label: chr [1:283] "All Recipients, Total" "Developing Countries, Total" "Europe, Total" "Albania" ...
## $ DONOR :'data.frame': 147 obs. of 2 variables:
## ..$ id : chr [1:147] "20005" "20001" "801" "1" ...
## ..$ label: chr [1:147] "Official Donors, Total" "DAC Countries, Total" "Australia" "Austria" ...
## $ PART :'data.frame': 2 obs. of 2 variables:
## ..$ id : chr [1:2] "1" "2"
## ..$ label: chr [1:2] "1 : Part I - Developing Countries" "2 : Part II - Countries in Transition"
## $ AIDTYPE :'data.frame': 24 obs. of 2 variables:
## ..$ id : chr [1:24] "201" "212" "221" "208" ...
## ..$ label: chr [1:24] "Grants, Total" "Grants: Debt Forgiveness" "Grants: Other Debt Grants" "AF/Interest Subsidies" ...
## $ DATATYPE :'data.frame': 2 obs. of 2 variables:
## ..$ id : chr [1:2] "A" "D"
## ..$ label: chr [1:2] "Current Prices" "Constant Prices"
## $ TIME :'data.frame': 59 obs. of 2 variables:
## ..$ id : chr [1:59] "1960" "1961" "1962" "1963" ...
## ..$ label: chr [1:59] "1960" "1961" "1962" "1963" ...
## $ OBS_STATUS :'data.frame': 15 obs. of 2 variables:
## ..$ id : chr [1:15] "B" "C" "D" "E" ...
## ..$ label: chr [1:15] "Break" "Non-publishable and confidential value" "Difference in methodology" "Estimated value" ...
## $ UNIT :'data.frame': 316 obs. of 2 variables:
## ..$ id : chr [1:316] "1" "GRWH" "AVGRW" "IDX" ...
## ..$ label: chr [1:316] "RATIOS" "Growth rate" "Average growth rate" "Index" ...
## $ POWERCODE :'data.frame': 32 obs. of 2 variables:
## ..$ id : chr [1:32] "0" "1" "2" "3" ...
## ..$ label: chr [1:32] "Units" "Tens" "Hundreds" "Thousands" ...
## $ REFERENCEPERIOD:'data.frame': 96 obs. of 2 variables:
## ..$ id : chr [1:96] "2013_100" "2012_100" "2011_100" "2010_100" ...
## ..$ label: chr [1:96] "2013=100" "2012=100" "2011=100" "2010=100" ...
## $ TIME_FORMAT :'data.frame': 5 obs. of 2 variables:
## ..$ id : chr [1:5] "P1Y" "P1M" "P3M" "P6M" ...
## ..$ label: chr [1:5] "Annual" "Monthly" "Quarterly" "Half-yearly" ...
dp_live %>% ggplot(aes(x = TIME, y = Value)) +
geom_line(aes(color = LOCATION)) +
geom_point(aes(color = LOCATION))
DATATYPE: A, Current Prices
dac2a_df %>% group_by(DONOR) %>% filter(DATATYPE == "A") %>% select(DONOR, obsTime, obsValue) %>%
ggplot(aes(x = obsTime, y = obsValue, group = DONOR)) +
geom_line(aes(color = DONOR)) +
geom_point(aes(color = DONOR))
DATATYPE: D, Constant Prices
dac2a_df %>% group_by(DONOR) %>% filter(DATATYPE == "D") %>%
ggplot(aes(x = obsTime, y = obsValue, group = DONOR)) +
geom_line(aes(color = DONOR)) +
geom_point(aes(color = DONOR))
OECD
Packagedff %>% filter(RECIPIENT == "10100", AIDTYPE == "201", Country %in% country, DATATYPE == "A") %>%
ggplot(aes(x = obsTime, y = obsValue, group = Country)) +
geom_line(aes(color = Country)) +
geom_point(aes(color = Country)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
labs(x = "Year", y = "US Dollar, Unit = 1 million USD")
get_dataset()
に代入して、データを取得する方法が書いてあったが、適切にできず、rsdmx
Package を使用。OECD
Package でデータを入手するとき、filter を試すが、あまりうまくいかない。