Built-in Datasets

Information on package ‘datasets’

Description:

Package: datasets
Version: 3.6.0
Priority: base
Title: The R Datasets Package
Author: R Core Team and contributors worldwide
Maintainer: R Core Team R-core@r-project.org
Description: Base R datasets.
License: Part of R 3.6.0
Built: R 3.6.0; ; 2019-05-06 18:45:49 UTC; unix

Index:

AirPassengers Monthly Airline Passenger Numbers 1949-1960
BJsales Sales Data with Leading Indicator
BOD Biochemical Oxygen Demand
CO2 Carbon Dioxide Uptake in Grass Plants
ChickWeight Weight versus age of chicks on different diets
DNase Elisa assay of DNase
EuStockMarkets Daily Closing Prices of Major European Stock Indices, 1991-1998
Formaldehyde Determination of Formaldehyde
HairEyeColor Hair and Eye Color of Statistics Students
Harman23.cor Harman Example 2.3
Harman74.cor Harman Example 7.4
Indometh Pharmacokinetics of Indomethacin
InsectSprays Effectiveness of Insect Sprays
JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share
LakeHuron Level of Lake Huron 1875-1972
LifeCycleSavings Intercountry Life-Cycle Savings Data
Loblolly Growth of Loblolly pine trees
Nile Flow of the River Nile
Orange Growth of Orange Trees
OrchardSprays Potency of Orchard Sprays
PlantGrowth Results from an Experiment on Plant Growth
Puromycin Reaction Velocity of an Enzymatic Reaction
Theoph Pharmacokinetics of Theophylline
Titanic Survival of passengers on the Titanic
ToothGrowth The Effect of Vitamin C on Tooth Growth in Guinea Pigs
UCBAdmissions Student Admissions at UC Berkeley
UKDriverDeaths Road Casualties in Great Britain 1969-84
UKLungDeaths Monthly Deaths from Lung Diseases in the UK
UKgas UK Quarterly Gas Consumption
USAccDeaths Accidental Deaths in the US 1973-1978
USArrests Violent Crime Rates by US State
USJudgeRatings Lawyers’ Ratings of State Judges in the US Superior Court
USPersonalExpenditure Personal Expenditure Data
VADeaths Death Rates in Virginia (1940)
WWWusage Internet Usage per Minute
WorldPhones The World’s Telephones
ability.cov Ability and Intelligence Tests
airmiles Passenger Miles on Commercial US Airlines, 1937-1960
airquality New York Air Quality Measurements
anscombe Anscombe’s Quartet of ‘Identical’ Simple Linear Regressions
attenu The Joyner-Boore Attenuation Data
attitude The Chatterjee-Price Attitude Data
austres Quarterly Time Series of the Number of Australian Residents
beavers Body Temperature Series of Two Beavers
cars Speed and Stopping Distances of Cars
chickwts Chicken Weights by Feed Type
co2 Mauna Loa Atmospheric CO2 Concentration
crimtab Student’s 3000 Criminals Data
datasets-package The R Datasets Package
discoveries Yearly Numbers of Important Discoveries
esoph Smoking, Alcohol and (O)esophageal Cancer
euro Conversion Rates of Euro Currencies
eurodist Distances Between European Cities and Between US Cities
faithful Old Faithful Geyser Data
freeny Freeny’s Revenue Data
infert Infertility after Spontaneous and Induced Abortion
iris Edgar Anderson’s Iris Data
islands Areas of the World’s Major Landmasses
lh Luteinizing Hormone in Blood Samples
longley Longley’s Economic Regression Data
lynx Annual Canadian Lynx trappings 1821-1934
morley Michelson Speed of Light Data
mtcars Motor Trend Car Road Tests
nhtemp Average Yearly Temperatures in New Haven
nottem Average Monthly Temperatures at Nottingham, 1920-1939
npk Classical N, P, K Factorial Experiment
occupationalStatus Occupational Status of Fathers and their Sons
precip Annual Precipitation in US Cities
presidents Quarterly Approval Ratings of US Presidents
pressure Vapor Pressure of Mercury as a Function of Temperature
quakes Locations of Earthquakes off Fiji
randu Random Numbers from Congruential Generator RANDU
rivers Lengths of Major North American Rivers
rock Measurements on Petroleum Rock Samples
sleep Student’s Sleep Data
stackloss Brownlee’s Stack Loss Plant Data
state US State Facts and Figures
sunspot.month Monthly Sunspot Data, from 1749 to “Present”
sunspot.year Yearly Sunspot Data, 1700-1988
sunspots Monthly Sunspot Numbers, 1749-1983
swiss Swiss Fertility and Socioeconomic Indicators (1888) Data
treering Yearly Treering Data, -6000-1979
trees Diameter, Height and Volume for Black Cherry Trees
uspop Populations Recorded by the US Census
volcano Topographic Information on Auckland’s Maunga Whau Volcano
warpbreaks The Number of Breaks in Yarn during Weaving
women Average Heights and Weights for American Women

TOP

AirPassengers Monthly Airline Passenger Numbers 1949-1960

Description
The classic Box & Jenkins airline data. Monthly totals of international airline passengers, 1949 to 1960.

Usage
AirPassengers
Format
A monthly time series, in thousands.

Source
Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976) Time Series Analysis, Forecasting and Control. Third Edition. Holden-Day. Series G.

Examples
## Not run: 
## These are quite slow and so not run by example(AirPassengers)

## The classic 'airline model', by full ML
(fit <- arima(log10(AirPassengers), c(0, 1, 1),
              seasonal = list(order = c(0, 1, 1), period = 12)))
update(fit, method = "CSS")
update(fit, x = window(log10(AirPassengers), start = 1954))
pred <- predict(fit, n.ahead = 24)
tl <- pred$pred - 1.96 * pred$se
tu <- pred$pred + 1.96 * pred$se
ts.plot(AirPassengers, 10^tl, 10^tu, log = "y", lty = c(1, 2, 2))

## full ML fit is the same if the series is reversed, CSS fit is not
ap0 <- rev(log10(AirPassengers))
attributes(ap0) <- attributes(AirPassengers)
arima(ap0, c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12))
arima(ap0, c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12),
      method = "CSS")

## Structural Time Series
ap <- log10(AirPassengers) - 2
(fit <- StructTS(ap, type = "BSM"))
par(mfrow = c(1, 2))
plot(cbind(ap, fitted(fit)), plot.type = "single")
plot(cbind(ap, tsSmooth(fit)), plot.type = "single")

## End(Not run)

head(AirPassengers)

## [1] 112 118 132 129 121 135

## The classic 'airline model', by full ML
(fit <- arima(log10(AirPassengers), c(0, 1, 1),
              seasonal = list(order = c(0, 1, 1), period = 12)))

## 
## Call:
## arima(x = log10(AirPassengers), order = c(0, 1, 1), seasonal = list(order = c(0, 
##     1, 1), period = 12))
## 
## Coefficients:
##           ma1     sma1
##       -0.4018  -0.5569
## s.e.   0.0896   0.0731
## 
## sigma^2 estimated as 0.0002543:  log likelihood = 353.96,  aic = -701.92

update(fit, method = "CSS")

## 
## Call:
## arima(x = log10(AirPassengers), order = c(0, 1, 1), seasonal = list(order = c(0, 
##     1, 1), period = 12), method = "CSS")
## 
## Coefficients:
##           ma1     sma1
##       -0.3772  -0.5724
## s.e.   0.0883   0.0704
## 
## sigma^2 estimated as 0.0002619:  part log likelihood = 354.32

update(fit, x = window(log10(AirPassengers), start = 1954))

## 
## Call:
## arima(x = window(log10(AirPassengers), start = 1954), order = c(0, 1, 1), seasonal = list(order = c(0, 
##     1, 1), period = 12))
## 
## Coefficients:
##           ma1     sma1
##       -0.4797  -0.4460
## s.e.   0.1000   0.1514
## 
## sigma^2 estimated as 0.0001603:  log likelihood = 208.02,  aic = -410.04

pred <- predict(fit, n.ahead = 24)
tl <- pred$pred - 1.96 * pred$se
tu <- pred$pred + 1.96 * pred$se
ts.plot(AirPassengers, 10^tl, 10^tu, log = "y", lty = c(1, 2, 2))

## full ML fit is the same if the series is reversed, CSS fit is not
ap0 <- rev(log10(AirPassengers))
attributes(ap0) <- attributes(AirPassengers)
arima(ap0, c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12))

## 
## Call:
## arima(x = ap0, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12))
## 
## Coefficients:
##           ma1     sma1
##       -0.4018  -0.5569
## s.e.   0.0896   0.0731
## 
## sigma^2 estimated as 0.0002543:  log likelihood = 353.95,  aic = -701.91

arima(ap0, c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12),
      method = "CSS")

## 
## Call:
## arima(x = ap0, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12), 
##     method = "CSS")
## 
## Coefficients:
##           ma1     sma1
##       -0.4188  -0.5224
## s.e.   0.0875   0.0695
## 
## sigma^2 estimated as 0.0002673:  part log likelihood = 353

## Structural Time Series
ap <- log10(AirPassengers) - 2
(fit <- StructTS(ap, type = "BSM"))

## 
## Call:
## StructTS(x = ap, type = "BSM")
## 
## Variances:
##     level      slope       seas    epsilon  
## 0.0001456  0.0000000  0.0002635  0.0000000

par(mfrow = c(1, 2))
plot(cbind(ap, fitted(fit)), plot.type = "single")
plot(cbind(ap, tsSmooth(fit)), plot.type = "single")

TOP

BJsales Sales Data with Leading Indicator

Description
The sales time series BJsales and leading indicator BJsales.lead each contain 150 observations. The objects are of class "ts".

Usage
BJsales
BJsales.lead
Source
The data are given in Box & Jenkins (1976). Obtained from the Time Series Data Library at http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/

References
G. E. P. Box and G. M. Jenkins (1976): Time Series Analysis, Forecasting and Control, Holden-Day, San Francisco, p. 537.

P. J. Brockwell and R. A. Davis (1991): Time Series: Theory and Methods, Second edition, Springer Verlag, NY, pp. 414.

head(BJsales)

## [1] 200.1 199.5 199.4 198.9 199.0 200.2

head(BJsales.lead)

## [1] 10.01 10.07 10.32  9.75 10.33 10.13

TOP

BOD Biochemical Oxygen Demand

Description
The BOD data frame has 6 rows and 2 columns giving the biochemical oxygen demand versus time in an evaluation of water quality.

Usage
BOD
Format
This data frame contains the following columns:

Time
A numeric vector giving the time of the measurement (days).

demand
A numeric vector giving the biochemical oxygen demand (mg/l).

Source
Bates, D.M. and Watts, D.G. (1988), Nonlinear Regression Analysis and Its Applications, Wiley, Appendix A1.4.

Originally from Marske (1967), Biochemical Oxygen Demand Data Interpretation Using Sum of Squares Surface M.Sc. Thesis, University of Wisconsin – Madison.

Examples

require(stats)
# simplest form of fitting a first-order model to these data
fm1 <- nls(demand ~ A*(1-exp(-exp(lrc)*Time)), data = BOD,
   start = c(A = 20, lrc = log(.35)))
coef(fm1)
fm1
# using the plinear algorithm
fm2 <- nls(demand ~ (1-exp(-exp(lrc)*Time)), data = BOD,
   start = c(lrc = log(.35)), algorithm = "plinear", trace = TRUE)
# using a self-starting model
fm3 <- nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD)
summary(fm3)

head(BOD)

example(BOD)

## 
## BOD> ## Don't show: 
## BOD> options(show.nls.convergence=FALSE)
## 
## BOD> old <- options(digits = 5)
## 
## BOD> ## End(Don't show)
## BOD> require(stats)
## 
## BOD> # simplest form of fitting a first-order model to these data
## BOD> fm1 <- nls(demand ~ A*(1-exp(-exp(lrc)*Time)), data = BOD,
## BOD+    start = c(A = 20, lrc = log(.35)))
## 
## BOD> coef(fm1)
##        A      lrc 
## 19.14258 -0.63282 
## 
## BOD> fm1
## Nonlinear regression model
##   model: demand ~ A * (1 - exp(-exp(lrc) * Time))
##    data: BOD
##      A    lrc 
## 19.143 -0.633 
##  residual sum-of-squares: 26
## 
## BOD> # using the plinear algorithm
## BOD> fm2 <- nls(demand ~ (1-exp(-exp(lrc)*Time)), data = BOD,
## BOD+    start = c(lrc = log(.35)), algorithm = "plinear", trace = TRUE)
## 32.946 :  -1.0498 22.1260
## 25.992 :  -0.62572 19.10319
## 25.99 :  -0.6327 19.1419
## 25.99 :  -0.63282 19.14256
## 
## BOD> # using a self-starting model
## BOD> fm3 <- nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD)
## 
## BOD> summary(fm3)
## 
## Formula: demand ~ SSasympOrig(Time, A, lrc)
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)   
## A     19.143      2.496    7.67   0.0016 **
## lrc   -0.633      0.382   -1.65   0.1733   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.55 on 4 degrees of freedom
## 
## 
## BOD> ## Don't show: 
## BOD> options(old)
## 
## BOD> ## End(Don't show)
## BOD> 
## BOD> 
## BOD>

TOP

CO2 Carbon Dioxide Uptake in Grass Plants

Description
The CO2 data frame has 84 rows and 5 columns of data from an experiment on the cold tolerance of the grass species Echinochloa crus-galli.

Usage
CO2
Format
An object of class c("nfnGroupedData", "nfGroupedData", "groupedData", "data.frame") containing the following columns:

Plant
an ordered factor with levels Qn1 < Qn2 < Qn3 < ... < Mc1 giving a unique identifier for each plant.

Type
a factor with levels Quebec Mississippi giving the origin of the plant

Treatment
a factor with levels nonchilled chilled

conc
a numeric vector of ambient carbon dioxide concentrations (mL/L).

uptake
a numeric vector of carbon dioxide uptake rates (umol/m^2 sec).

Details
The CO2 uptake of six plants from Quebec and six plants from Mississippi was measured at several levels of ambient CO2 concentration. Half the plants of each type were chilled overnight before the experiment was conducted.

This dataset was originally part of package nlme, and that has methods (including for [, as.data.frame, plot and print) for its grouped-data classes.

Source
Potvin, C., Lechowicz, M. J. and Tardif, S. (1990) “The statistical analysis of ecophysiological response curves obtained from experiments involving repeated measures”, Ecology, 71, 1389–1400.

Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS, Springer.

Examples
require(stats); require(graphics)

coplot(uptake ~ conc | Plant, data = CO2, show.given = FALSE, type = "b")
## fit the data for the first plant
fm1 <- nls(uptake ~ SSasymp(conc, Asym, lrc, c0),
   data = CO2, subset = Plant == "Qn1")
summary(fm1)
## fit each plant separately
fmlist <- list()
for (pp in levels(CO2$Plant)) {
  fmlist[[pp]] <- nls(uptake ~ SSasymp(conc, Asym, lrc, c0),
      data = CO2, subset = Plant == pp)
}
## check the coefficients by plant
print(sapply(fmlist, coef), digits = 3)

head(CO2)

example(CO2)

## 
## CO2> require(stats); require(graphics)
## 
## CO2> ## Don't show: 
## CO2> options(show.nls.convergence=FALSE)
## 
## CO2> ## End(Don't show)
## CO2> coplot(uptake ~ conc | Plant, data = CO2, show.given = FALSE, type = "b")

## 
## CO2> ## fit the data for the first plant
## CO2> fm1 <- nls(uptake ~ SSasymp(conc, Asym, lrc, c0),
## CO2+    data = CO2, subset = Plant == "Qn1")
## 
## CO2> summary(fm1)
## 
## Formula: uptake ~ SSasymp(conc, Asym, lrc, c0)
## 
## Parameters:
##      Estimate Std. Error t value Pr(>|t|)    
## Asym  38.1398     0.9164  41.620 1.99e-06 ***
## lrc  -34.2766    18.9661  -1.807    0.145    
## c0    -4.3806     0.2042 -21.457 2.79e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.663 on 4 degrees of freedom
## 
## 
## CO2> ## fit each plant separately
## CO2> fmlist <- list()
## 
## CO2> for (pp in levels(CO2$Plant)) {
## CO2+   fmlist[[pp]] <- nls(uptake ~ SSasymp(conc, Asym, lrc, c0),
## CO2+       data = CO2, subset = Plant == pp)
## CO2+ }
## 
## CO2> ## check the coefficients by plant
## CO2> print(sapply(fmlist, coef), digits = 3)
##         Qn1    Qn2    Qn3   Qc1    Qc3    Qc2    Mn3    Mn2   Mn1   Mc2     Mc3
## Asym  38.14  42.87  44.23 36.43  40.68  39.82  28.48  32.13 34.08 13.56   18.54
## lrc  -34.28 -29.66 -37.63 -9.90 -11.54 -51.53 -17.37 -29.04 -8.81 -1.98 -136.11
## c0    -4.38  -4.67  -4.49 -4.86  -4.95  -4.46  -4.59  -4.47 -5.06 -4.56   -3.47
##        Mc1
## Asym 21.79
## lrc   2.45
## c0   -5.14

TOP

ChickWeight Weight versus age of chicks on different diets

Description
The ChickWeight data frame has 578 rows and 4 columns from an experiment on the effect of diet on early growth of chicks.

Usage
ChickWeight
Format
An object of class c("nfnGroupedData", "nfGroupedData", "groupedData", "data.frame") containing the following columns:

weight
a numeric vector giving the body weight of the chick (gm).

Time
a numeric vector giving the number of days since birth when the measurement was made.

Chick
an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The ordering of the levels groups chicks on the same diet together and orders them according to their final weight (lightest to heaviest) within diet.

Diet
a factor with levels 1, ..., 4 indicating which experimental diet the chick received.

Details
The body weights of the chicks were measured at birth and every second day thereafter until day 20. They were also measured on day 21. There were four groups on chicks on different protein diets.

This dataset was originally part of package nlme, and that has methods (including for [, as.data.frame, plot and print) for its grouped-data classes.

Source
Crowder, M. and Hand, D. (1990), Analysis of Repeated Measures, Chapman and Hall (example 5.3)

Hand, D. and Crowder, M. (1996), Practical Longitudinal Data Analysis, Chapman and Hall (table A.2)

Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS, Springer.

See Also
SSlogis for models fitted to this dataset.

Examples
require(graphics)
coplot(weight ~ Time | Chick, data = ChickWeight,
       type = "b", show.given = FALSE)

head(ChickWeight)

example(ChickWeight)

## 
## ChckWg> ## No test: 
## ChckWg> ##D require(graphics)
## ChckWg> ##D coplot(weight ~ Time | Chick, data = ChickWeight,
## ChckWg> ##D        type = "b", show.given = FALSE)
## ChckWg> ## End(No test)
## ChckWg> 
## ChckWg>

TOP

DNase Elisa assay of DNase

Description
The DNase data frame has 176 rows and 3 columns of data obtained during development of an ELISA assay for the recombinant protein DNase in rat serum.

Usage
DNase
Format
An object of class c("nfnGroupedData", "nfGroupedData", "groupedData", "data.frame") containing the following columns:

Run
an ordered factor with levels 10 < ... < 3 indicating the assay run.

conc
a numeric vector giving the known concentration of the protein.

density
a numeric vector giving the measured optical density (dimensionless) in the assay. Duplicate optical density measurements were obtained.

Details
This dataset was originally part of package nlme, and that has methods (including for [, as.data.frame, plot and print) for its grouped-data classes.

Source
Davidian, M. and Giltinan, D. M. (1995) Nonlinear Models for Repeated Measurement Data, Chapman & Hall (section 5.2.4, p. 134)

Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS, Springer.

Examples
require(stats); require(graphics)

coplot(density ~ conc | Run, data = DNase,
       show.given = FALSE, type = "b")
coplot(density ~ log(conc) | Run, data = DNase,
       show.given = FALSE, type = "b")
## fit a representative run
fm1 <- nls(density ~ SSlogis( log(conc), Asym, xmid, scal ),
    data = DNase, subset = Run == 1)
## compare with a four-parameter logistic
fm2 <- nls(density ~ SSfpl( log(conc), A, B, xmid, scal ),
    data = DNase, subset = Run == 1)
summary(fm2)
anova(fm1, fm2)

head(DNase)

example(DNase)

## 
## DNase> require(stats); require(graphics)
## 
## DNase> ## Don't show: 
## DNase> options(show.nls.convergence=FALSE)
## 
## DNase> ## End(Don't show)
## DNase> coplot(density ~ conc | Run, data = DNase,
## DNase+        show.given = FALSE, type = "b")

## 
## DNase> coplot(density ~ log(conc) | Run, data = DNase,
## DNase+        show.given = FALSE, type = "b")

## 
## DNase> ## fit a representative run
## DNase> fm1 <- nls(density ~ SSlogis( log(conc), Asym, xmid, scal ),
## DNase+     data = DNase, subset = Run == 1)
## 
## DNase> ## compare with a four-parameter logistic
## DNase> fm2 <- nls(density ~ SSfpl( log(conc), A, B, xmid, scal ),
## DNase+     data = DNase, subset = Run == 1)
## 
## DNase> summary(fm2)
## 
## Formula: density ~ SSfpl(log(conc), A, B, xmid, scal)
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## A    -0.007897   0.017200  -0.459    0.654    
## B     2.377239   0.109516  21.707 5.35e-11 ***
## xmid  1.507403   0.102080  14.767 4.65e-09 ***
## scal  1.062579   0.056996  18.643 3.16e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01981 on 12 degrees of freedom
## 
## 
## DNase> anova(fm1, fm2)
## Analysis of Variance Table
## 
## Model 1: density ~ SSlogis(log(conc), Asym, xmid, scal)
## Model 2: density ~ SSfpl(log(conc), A, B, xmid, scal)
##   Res.Df Res.Sum Sq Df     Sum Sq F value Pr(>F)
## 1     13  0.0047896                             
## 2     12  0.0047073  1 8.2314e-05  0.2098 0.6551

TOP

EuStockMarkets Daily Closing Prices of Major European Stock Indices, 1991-1998

Description
Contains the daily closing prices of major European stock indices: Germany DAX (Ibis), Switzerland SMI, France CAC, and UK FTSE. The data are sampled in business time, i.e., weekends and holidays are omitted.

Usage
EuStockMarkets
Format
A multivariate time series with 1860 observations on 4 variables. The object is of class "mts".

Source
The data were kindly provided by Erste Bank AG, Vienna, Austria.

head(EuStockMarkets)

##          DAX    SMI    CAC   FTSE
## [1,] 1628.75 1678.1 1772.8 2443.6
## [2,] 1613.63 1688.5 1750.5 2460.2
## [3,] 1606.51 1678.6 1718.0 2448.2
## [4,] 1621.04 1684.1 1708.1 2470.4
## [5,] 1618.16 1686.6 1723.1 2484.7
## [6,] 1610.61 1671.6 1714.3 2466.8

TOP

Formaldehyde Determination of Formaldehyde

Description
These data are from a chemical experiment to prepare a standard curve for the determination of formaldehyde by the addition of chromatropic acid and concentrated sulphuric acid and the reading of the resulting purple color on a spectrophotometer.

Usage
Formaldehyde
Format
A data frame with 6 observations on 2 variables.

[,1]    carb    numeric Carbohydrate (ml)
[,2]    optden  numeric Optical Density
Source
Bennett, N. A. and N. L. Franklin (1954) Statistical Analysis in Chemistry and the Chemical Industry. New York: Wiley.

References
McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

Examples
require(stats); require(graphics)
plot(optden ~ carb, data = Formaldehyde,
     xlab = "Carbohydrate (ml)", ylab = "Optical Density",
     main = "Formaldehyde data", col = 4, las = 1)
abline(fm1 <- lm(optden ~ carb, data = Formaldehyde))
summary(fm1)
opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0))
plot(fm1)
par(opar)

head(Formaldehyde)

example(Formaldehyde)

## 
## Frmldh> require(stats); require(graphics)
## 
## Frmldh> plot(optden ~ carb, data = Formaldehyde,
## Frmldh+      xlab = "Carbohydrate (ml)", ylab = "Optical Density",
## Frmldh+      main = "Formaldehyde data", col = 4, las = 1)

## 
## Frmldh> abline(fm1 <- lm(optden ~ carb, data = Formaldehyde))
## 
## Frmldh> summary(fm1)
## 
## Call:
## lm(formula = optden ~ carb, data = Formaldehyde)
## 
## Residuals:
##         1         2         3         4         5         6 
## -0.006714  0.001029  0.002771  0.007143  0.007514 -0.011743 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.005086   0.007834   0.649    0.552    
## carb        0.876286   0.013535  64.744 3.41e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.008649 on 4 degrees of freedom
## Multiple R-squared:  0.999,  Adjusted R-squared:  0.9988 
## F-statistic:  4192 on 1 and 4 DF,  p-value: 3.409e-07
## 
## 
## Frmldh> opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0))
## 
## Frmldh> plot(fm1)

## 
## Frmldh> par(opar)

TOP

HairEyeColor Hair and Eye Color of Statistics Students

Description
Distribution of hair and eye color and sex in 592 statistics students.

Usage
HairEyeColor
Format
A 3-dimensional array resulting from cross-tabulating 592 observations on 3 variables. The variables and their levels are as follows:

No  Name    Levels
1   Hair    Black, Brown, Red, Blond
2   Eye Brown, Blue, Hazel, Green
3   Sex Male, Female
Details
The Hair x Eye table comes from a survey of students at the University of Delaware reported by Snee (1974). The split by Sex was added by Friendly (1992a) for didactic purposes.

This data set is useful for illustrating various techniques for the analysis of contingency tables, such as the standard chi-squared test or, more generally, log-linear modelling, and graphical methods such as mosaic plots, sieve diagrams or association plots.

Source
http://euclid.psych.yorku.ca/ftp/sas/vcd/catdata/haireye.sas

Snee (1974) gives the two-way table aggregated over Sex. The Sex split of the ‘Brown hair, Brown eye’ cell was changed to agree with that used by Friendly (2000).

References
Snee, R. D. (1974). Graphical display of two-way contingency tables. The American Statistician, 28, 9–12. doi: 10.2307/2683520.

Friendly, M. (1992a). Graphical methods for categorical data. SAS User Group International Conference Proceedings, 17, 190–200. http://www.math.yorku.ca/SCS/sugi/sugi17-paper.html

Friendly, M. (1992b). Mosaic displays for loglinear models. Proceedings of the Statistical Graphics Section, American Statistical Association, pp. 61–68. http://www.math.yorku.ca/SCS/Papers/asa92.html

Friendly, M. (2000). Visualizing Categorical Data. SAS Institute, ISBN 1-58025-660-0.

See Also
chisq.test, loglin, mosaicplot

Examples
require(graphics)
## Full mosaic
mosaicplot(HairEyeColor)
## Aggregate over sex (as in Snee's original data)
x <- apply(HairEyeColor, c(1, 2), sum)
x
mosaicplot(x, main = "Relation between hair and eye color")

head(HairEyeColor)

## [1] 32 53 10  3 11 50

example(HairEyeColor)

## 
## HrEyCl> require(graphics)
## 
## HrEyCl> ## Full mosaic
## HrEyCl> mosaicplot(HairEyeColor)

## 
## HrEyCl> ## Aggregate over sex (as in Snee's original data)
## HrEyCl> x <- apply(HairEyeColor, c(1, 2), sum)
## 
## HrEyCl> x
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    68   20    15     5
##   Brown   119   84    54    29
##   Red      26   17    14    14
##   Blond     7   94    10    16
## 
## HrEyCl> mosaicplot(x, main = "Relation between hair and eye color")

TOP

Harman23.cor Harman Example 2.3

Description
A correlation matrix of eight physical measurements on 305 girls between ages seven and seventeen.

Usage
Harman23.cor
Source
Harman, H. H. (1976) Modern Factor Analysis, Third Edition Revised, University of Chicago Press, Table 2.3.

Examples
require(stats)
(Harman23.FA <- factanal(factors = 1, covmat = Harman23.cor))
for(factors in 2:4) print(update(Harman23.FA, factors = factors))

head(Harman23.cor)

## $cov
##                height arm.span forearm lower.leg weight bitro.diameter
## height          1.000    0.846   0.805     0.859  0.473          0.398
## arm.span        0.846    1.000   0.881     0.826  0.376          0.326
## forearm         0.805    0.881   1.000     0.801  0.380          0.319
## lower.leg       0.859    0.826   0.801     1.000  0.436          0.329
## weight          0.473    0.376   0.380     0.436  1.000          0.762
## bitro.diameter  0.398    0.326   0.319     0.329  0.762          1.000
## chest.girth     0.301    0.277   0.237     0.327  0.730          0.583
## chest.width     0.382    0.415   0.345     0.365  0.629          0.577
##                chest.girth chest.width
## height               0.301       0.382
## arm.span             0.277       0.415
## forearm              0.237       0.345
## lower.leg            0.327       0.365
## weight               0.730       0.629
## bitro.diameter       0.583       0.577
## chest.girth          1.000       0.539
## chest.width          0.539       1.000
## 
## $center
## [1] 0 0 0 0 0 0 0 0
## 
## $n.obs
## [1] 305

example(Harman23.cor)

## 
## Hrm23.> require(stats)
## 
## Hrm23.> (Harman23.FA <- factanal(factors = 1, covmat = Harman23.cor))
## 
## Call:
## factanal(factors = 1, covmat = Harman23.cor)
## 
## Uniquenesses:
##         height       arm.span        forearm      lower.leg         weight 
##          0.158          0.135          0.190          0.187          0.760 
## bitro.diameter    chest.girth    chest.width 
##          0.829          0.877          0.801 
## 
## Loadings:
##                Factor1
## height         0.918  
## arm.span       0.930  
## forearm        0.900  
## lower.leg      0.902  
## weight         0.490  
## bitro.diameter 0.413  
## chest.girth    0.351  
## chest.width    0.446  
## 
##                Factor1
## SS loadings      4.064
## Proportion Var   0.508
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 611.44 on 20 degrees of freedom.
## The p-value is 1.12e-116 
## 
## Hrm23.> for(factors in 2:4) print(update(Harman23.FA, factors = factors))
## 
## Call:
## factanal(factors = factors, covmat = Harman23.cor)
## 
## Uniquenesses:
##         height       arm.span        forearm      lower.leg         weight 
##          0.170          0.107          0.166          0.199          0.089 
## bitro.diameter    chest.girth    chest.width 
##          0.364          0.416          0.537 
## 
## Loadings:
##                Factor1 Factor2
## height         0.865   0.287  
## arm.span       0.927   0.181  
## forearm        0.895   0.179  
## lower.leg      0.859   0.252  
## weight         0.233   0.925  
## bitro.diameter 0.194   0.774  
## chest.girth    0.134   0.752  
## chest.width    0.278   0.621  
## 
##                Factor1 Factor2
## SS loadings      3.335   2.617
## Proportion Var   0.417   0.327
## Cumulative Var   0.417   0.744
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 75.74 on 13 degrees of freedom.
## The p-value is 6.94e-11 
## 
## Call:
## factanal(factors = factors, covmat = Harman23.cor)
## 
## Uniquenesses:
##         height       arm.span        forearm      lower.leg         weight 
##          0.127          0.005          0.193          0.157          0.090 
## bitro.diameter    chest.girth    chest.width 
##          0.359          0.411          0.490 
## 
## Loadings:
##                Factor1 Factor2 Factor3
## height          0.886   0.267  -0.130 
## arm.span        0.937   0.195   0.280 
## forearm         0.874   0.188         
## lower.leg       0.877   0.230  -0.145 
## weight          0.242   0.916  -0.106 
## bitro.diameter  0.193   0.777         
## chest.girth     0.137   0.755         
## chest.width     0.261   0.646   0.159 
## 
##                Factor1 Factor2 Factor3
## SS loadings      3.379   2.628   0.162
## Proportion Var   0.422   0.329   0.020
## Cumulative Var   0.422   0.751   0.771
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 22.81 on 7 degrees of freedom.
## The p-value is 0.00184 
## 
## Call:
## factanal(factors = factors, covmat = Harman23.cor)
## 
## Uniquenesses:
##         height       arm.span        forearm      lower.leg         weight 
##          0.137          0.005          0.191          0.116          0.138 
## bitro.diameter    chest.girth    chest.width 
##          0.283          0.178          0.488 
## 
## Loadings:
##                Factor1 Factor2 Factor3 Factor4
## height          0.879   0.277          -0.115 
## arm.span        0.937   0.194           0.277 
## forearm         0.875   0.191                 
## lower.leg       0.887   0.209   0.135  -0.188 
## weight          0.246   0.882   0.111  -0.109 
## bitro.diameter  0.187   0.822                 
## chest.girth     0.117   0.729   0.526         
## chest.width     0.263   0.644           0.141 
## 
##                Factor1 Factor2 Factor3 Factor4
## SS loadings      3.382   2.595   0.323   0.165
## Proportion Var   0.423   0.324   0.040   0.021
## Cumulative Var   0.423   0.747   0.787   0.808
## 
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 4.63 on 2 degrees of freedom.
## The p-value is 0.0988

TOP

Harman74.cor Harman Example 7.4

Description
A correlation matrix of 24 psychological tests given to 145 seventh and eight-grade children in a Chicago suburb by Holzinger and Swineford.

Usage
Harman74.cor
Source
Harman, H. H. (1976) Modern Factor Analysis, Third Edition Revised, University of Chicago Press, Table 7.4.

Examples
require(stats)
(Harman74.FA <- factanal(factors = 1, covmat = Harman74.cor))
for(factors in 2:5) print(update(Harman74.FA, factors = factors))
Harman74.FA <- factanal(factors = 5, covmat = Harman74.cor,
                        rotation = "promax")
print(Harman74.FA$loadings, sort = TRUE)

head(Harman74.cor)

## $cov
##                        VisualPerception Cubes PaperFormBoard Flags
## VisualPerception                  1.000 0.318          0.403 0.468
## Cubes                             0.318 1.000          0.317 0.230
## PaperFormBoard                    0.403 0.317          1.000 0.305
## Flags                             0.468 0.230          0.305 1.000
## GeneralInformation                0.321 0.285          0.247 0.227
## PargraphComprehension             0.335 0.234          0.268 0.327
## SentenceCompletion                0.304 0.157          0.223 0.335
## WordClassification                0.332 0.157          0.382 0.391
## WordMeaning                       0.326 0.195          0.184 0.325
## Addition                          0.116 0.057         -0.075 0.099
## Code                              0.308 0.150          0.091 0.110
## CountingDots                      0.314 0.145          0.140 0.160
## StraightCurvedCapitals            0.489 0.239          0.321 0.327
## WordRecognition                   0.125 0.103          0.177 0.066
## NumberRecognition                 0.238 0.131          0.065 0.127
## FigureRecognition                 0.414 0.272          0.263 0.322
## ObjectNumber                      0.176 0.005          0.177 0.187
## NumberFigure                      0.368 0.255          0.211 0.251
## FigureWord                        0.270 0.112          0.312 0.137
## Deduction                         0.365 0.292          0.297 0.339
## NumericalPuzzles                  0.369 0.306          0.165 0.349
## ProblemReasoning                  0.413 0.232          0.250 0.380
## SeriesCompletion                  0.474 0.348          0.383 0.335
## ArithmeticProblems                0.282 0.211          0.203 0.248
##                        GeneralInformation PargraphComprehension
## VisualPerception                    0.321                 0.335
## Cubes                               0.285                 0.234
## PaperFormBoard                      0.247                 0.268
## Flags                               0.227                 0.327
## GeneralInformation                  1.000                 0.622
## PargraphComprehension               0.622                 1.000
## SentenceCompletion                  0.656                 0.722
## WordClassification                  0.578                 0.527
## WordMeaning                         0.723                 0.714
## Addition                            0.311                 0.203
## Code                                0.344                 0.353
## CountingDots                        0.215                 0.095
## StraightCurvedCapitals              0.344                 0.309
## WordRecognition                     0.280                 0.292
## NumberRecognition                   0.229                 0.251
## FigureRecognition                   0.187                 0.291
## ObjectNumber                        0.208                 0.273
## NumberFigure                        0.263                 0.167
## FigureWord                          0.190                 0.251
## Deduction                           0.398                 0.435
## NumericalPuzzles                    0.318                 0.263
## ProblemReasoning                    0.441                 0.386
## SeriesCompletion                    0.435                 0.431
## ArithmeticProblems                  0.420                 0.433
##                        SentenceCompletion WordClassification WordMeaning
## VisualPerception                    0.304              0.332       0.326
## Cubes                               0.157              0.157       0.195
## PaperFormBoard                      0.223              0.382       0.184
## Flags                               0.335              0.391       0.325
## GeneralInformation                  0.656              0.578       0.723
## PargraphComprehension               0.722              0.527       0.714
## SentenceCompletion                  1.000              0.619       0.685
## WordClassification                  0.619              1.000       0.532
## WordMeaning                         0.685              0.532       1.000
## Addition                            0.246              0.285       0.170
## Code                                0.232              0.300       0.280
## CountingDots                        0.181              0.271       0.113
## StraightCurvedCapitals              0.345              0.395       0.280
## WordRecognition                     0.236              0.252       0.260
## NumberRecognition                   0.172              0.175       0.248
## FigureRecognition                   0.180              0.296       0.242
## ObjectNumber                        0.228              0.255       0.274
## NumberFigure                        0.159              0.250       0.208
## FigureWord                          0.226              0.274       0.274
## Deduction                           0.451              0.427       0.446
## NumericalPuzzles                    0.314              0.362       0.266
## ProblemReasoning                    0.396              0.357       0.483
## SeriesCompletion                    0.405              0.501       0.504
## ArithmeticProblems                  0.437              0.388       0.424
##                        Addition  Code CountingDots StraightCurvedCapitals
## VisualPerception          0.116 0.308        0.314                  0.489
## Cubes                     0.057 0.150        0.145                  0.239
## PaperFormBoard           -0.075 0.091        0.140                  0.321
## Flags                     0.099 0.110        0.160                  0.327
## GeneralInformation        0.311 0.344        0.215                  0.344
## PargraphComprehension     0.203 0.353        0.095                  0.309
## SentenceCompletion        0.246 0.232        0.181                  0.345
## WordClassification        0.285 0.300        0.271                  0.395
## WordMeaning               0.170 0.280        0.113                  0.280
## Addition                  1.000 0.484        0.585                  0.408
## Code                      0.484 1.000        0.428                  0.535
## CountingDots              0.585 0.428        1.000                  0.512
## StraightCurvedCapitals    0.408 0.535        0.512                  1.000
## WordRecognition           0.172 0.350        0.131                  0.195
## NumberRecognition         0.154 0.240        0.173                  0.139
## FigureRecognition         0.124 0.314        0.119                  0.281
## ObjectNumber              0.289 0.362        0.278                  0.194
## NumberFigure              0.317 0.350        0.349                  0.323
## FigureWord                0.190 0.290        0.110                  0.263
## Deduction                 0.173 0.202        0.246                  0.241
## NumericalPuzzles          0.405 0.399        0.355                  0.425
## ProblemReasoning          0.160 0.304        0.193                  0.279
## SeriesCompletion          0.262 0.251        0.350                  0.382
## ArithmeticProblems        0.531 0.412        0.414                  0.358
##                        WordRecognition NumberRecognition FigureRecognition
## VisualPerception                 0.125             0.238             0.414
## Cubes                            0.103             0.131             0.272
## PaperFormBoard                   0.177             0.065             0.263
## Flags                            0.066             0.127             0.322
## GeneralInformation               0.280             0.229             0.187
## PargraphComprehension            0.292             0.251             0.291
## SentenceCompletion               0.236             0.172             0.180
## WordClassification               0.252             0.175             0.296
## WordMeaning                      0.260             0.248             0.242
## Addition                         0.172             0.154             0.124
## Code                             0.350             0.240             0.314
## CountingDots                     0.131             0.173             0.119
## StraightCurvedCapitals           0.195             0.139             0.281
## WordRecognition                  1.000             0.370             0.412
## NumberRecognition                0.370             1.000             0.325
## FigureRecognition                0.412             0.325             1.000
## ObjectNumber                     0.341             0.345             0.324
## NumberFigure                     0.201             0.334             0.344
## FigureWord                       0.206             0.192             0.258
## Deduction                        0.302             0.272             0.388
## NumericalPuzzles                 0.183             0.232             0.348
## ProblemReasoning                 0.243             0.246             0.283
## SeriesCompletion                 0.242             0.256             0.360
## ArithmeticProblems               0.304             0.165             0.262
##                        ObjectNumber NumberFigure FigureWord Deduction
## VisualPerception              0.176        0.368      0.270     0.365
## Cubes                         0.005        0.255      0.112     0.292
## PaperFormBoard                0.177        0.211      0.312     0.297
## Flags                         0.187        0.251      0.137     0.339
## GeneralInformation            0.208        0.263      0.190     0.398
## PargraphComprehension         0.273        0.167      0.251     0.435
## SentenceCompletion            0.228        0.159      0.226     0.451
## WordClassification            0.255        0.250      0.274     0.427
## WordMeaning                   0.274        0.208      0.274     0.446
## Addition                      0.289        0.317      0.190     0.173
## Code                          0.362        0.350      0.290     0.202
## CountingDots                  0.278        0.349      0.110     0.246
## StraightCurvedCapitals        0.194        0.323      0.263     0.241
## WordRecognition               0.341        0.201      0.206     0.302
## NumberRecognition             0.345        0.334      0.192     0.272
## FigureRecognition             0.324        0.344      0.258     0.388
## ObjectNumber                  1.000        0.448      0.324     0.262
## NumberFigure                  0.448        1.000      0.358     0.301
## FigureWord                    0.324        0.358      1.000     0.167
## Deduction                     0.262        0.301      0.167     1.000
## NumericalPuzzles              0.173        0.357      0.331     0.413
## ProblemReasoning              0.273        0.317      0.342     0.463
## SeriesCompletion              0.287        0.272      0.303     0.509
## ArithmeticProblems            0.326        0.405      0.374     0.366
##                        NumericalPuzzles ProblemReasoning SeriesCompletion
## VisualPerception                  0.369            0.413            0.474
## Cubes                             0.306            0.232            0.348
## PaperFormBoard                    0.165            0.250            0.383
## Flags                             0.349            0.380            0.335
## GeneralInformation                0.318            0.441            0.435
## PargraphComprehension             0.263            0.386            0.431
## SentenceCompletion                0.314            0.396            0.405
## WordClassification                0.362            0.357            0.501
## WordMeaning                       0.266            0.483            0.504
## Addition                          0.405            0.160            0.262
## Code                              0.399            0.304            0.251
## CountingDots                      0.355            0.193            0.350
## StraightCurvedCapitals            0.425            0.279            0.382
## WordRecognition                   0.183            0.243            0.242
## NumberRecognition                 0.232            0.246            0.256
## FigureRecognition                 0.348            0.283            0.360
## ObjectNumber                      0.173            0.273            0.287
## NumberFigure                      0.357            0.317            0.272
## FigureWord                        0.331            0.342            0.303
## Deduction                         0.413            0.463            0.509
## NumericalPuzzles                  1.000            0.374            0.451
## ProblemReasoning                  0.374            1.000            0.503
## SeriesCompletion                  0.451            0.503            1.000
## ArithmeticProblems                0.448            0.375            0.434
##                        ArithmeticProblems
## VisualPerception                    0.282
## Cubes                               0.211
## PaperFormBoard                      0.203
## Flags                               0.248
## GeneralInformation                  0.420
## PargraphComprehension               0.433
## SentenceCompletion                  0.437
## WordClassification                  0.388
## WordMeaning                         0.424
## Addition                            0.531
## Code                                0.412
## CountingDots                        0.414
## StraightCurvedCapitals              0.358
## WordRecognition                     0.304
## NumberRecognition                   0.165
## FigureRecognition                   0.262
## ObjectNumber                        0.326
## NumberFigure                        0.405
## FigureWord                          0.374
## Deduction                           0.366
## NumericalPuzzles                    0.448
## ProblemReasoning                    0.375
## SeriesCompletion                    0.434
## ArithmeticProblems                  1.000
## 
## $center
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 
## $n.obs
## [1] 145

example(Harman74.cor)

## 
## Hrm74.> require(stats)
## 
## Hrm74.> (Harman74.FA <- factanal(factors = 1, covmat = Harman74.cor))
## 
## Call:
## factanal(factors = 1, covmat = Harman74.cor)
## 
## Uniquenesses:
##       VisualPerception                  Cubes         PaperFormBoard 
##                  0.677                  0.866                  0.830 
##                  Flags     GeneralInformation  PargraphComprehension 
##                  0.768                  0.487                  0.491 
##     SentenceCompletion     WordClassification            WordMeaning 
##                  0.500                  0.514                  0.474 
##               Addition                   Code           CountingDots 
##                  0.818                  0.731                  0.824 
## StraightCurvedCapitals        WordRecognition      NumberRecognition 
##                  0.681                  0.833                  0.863 
##      FigureRecognition           ObjectNumber           NumberFigure 
##                  0.775                  0.812                  0.778 
##             FigureWord              Deduction       NumericalPuzzles 
##                  0.816                  0.612                  0.676 
##       ProblemReasoning       SeriesCompletion     ArithmeticProblems 
##                  0.619                  0.524                  0.593 
## 
## Loadings:
##                        Factor1
## VisualPerception       0.569  
## Cubes                  0.366  
## PaperFormBoard         0.412  
## Flags                  0.482  
## GeneralInformation     0.716  
## PargraphComprehension  0.713  
## SentenceCompletion     0.707  
## WordClassification     0.697  
## WordMeaning            0.725  
## Addition               0.426  
## Code                   0.519  
## CountingDots           0.419  
## StraightCurvedCapitals 0.565  
## WordRecognition        0.408  
## NumberRecognition      0.370  
## FigureRecognition      0.474  
## ObjectNumber           0.434  
## NumberFigure           0.471  
## FigureWord             0.429  
## Deduction              0.623  
## NumericalPuzzles       0.569  
## ProblemReasoning       0.617  
## SeriesCompletion       0.690  
## ArithmeticProblems     0.638  
## 
##                Factor1
## SS loadings      7.438
## Proportion Var   0.310
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 622.91 on 252 degrees of freedom.
## The p-value is 2.28e-33 
## 
## Hrm74.> for(factors in 2:5) print(update(Harman74.FA, factors = factors))
## 
## Call:
## factanal(factors = factors, covmat = Harman74.cor)
## 
## Uniquenesses:
##       VisualPerception                  Cubes         PaperFormBoard 
##                  0.650                  0.864                  0.844 
##                  Flags     GeneralInformation  PargraphComprehension 
##                  0.778                  0.375                  0.316 
##     SentenceCompletion     WordClassification            WordMeaning 
##                  0.319                  0.503                  0.258 
##               Addition                   Code           CountingDots 
##                  0.670                  0.608                  0.581 
## StraightCurvedCapitals        WordRecognition      NumberRecognition 
##                  0.567                  0.832                  0.850 
##      FigureRecognition           ObjectNumber           NumberFigure 
##                  0.743                  0.770                  0.625 
##             FigureWord              Deduction       NumericalPuzzles 
##                  0.792                  0.629                  0.579 
##       ProblemReasoning       SeriesCompletion     ArithmeticProblems 
##                  0.634                  0.539                  0.553 
## 
## Loadings:
##                        Factor1 Factor2
## VisualPerception       0.506   0.306  
## Cubes                  0.304   0.209  
## PaperFormBoard         0.297   0.260  
## Flags                  0.327   0.339  
## GeneralInformation     0.240   0.753  
## PargraphComprehension  0.171   0.809  
## SentenceCompletion     0.163   0.809  
## WordClassification     0.344   0.615  
## WordMeaning            0.148   0.849  
## Addition               0.563   0.115  
## Code                   0.591   0.207  
## CountingDots           0.647          
## StraightCurvedCapitals 0.612   0.241  
## WordRecognition        0.315   0.263  
## NumberRecognition      0.328   0.205  
## FigureRecognition      0.457   0.218  
## ObjectNumber           0.431   0.209  
## NumberFigure           0.601   0.116  
## FigureWord             0.399   0.222  
## Deduction              0.379   0.477  
## NumericalPuzzles       0.604   0.237  
## ProblemReasoning       0.390   0.462  
## SeriesCompletion       0.486   0.474  
## ArithmeticProblems     0.544   0.389  
## 
##                Factor1 Factor2
## SS loadings      4.573   4.548
## Proportion Var   0.191   0.190
## Cumulative Var   0.191   0.380
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 420.24 on 229 degrees of freedom.
## The p-value is 2.01e-13 
## 
## Call:
## factanal(factors = factors, covmat = Harman74.cor)
## 
## Uniquenesses:
##       VisualPerception                  Cubes         PaperFormBoard 
##                  0.500                  0.793                  0.662 
##                  Flags     GeneralInformation  PargraphComprehension 
##                  0.694                  0.352                  0.316 
##     SentenceCompletion     WordClassification            WordMeaning 
##                  0.300                  0.502                  0.256 
##               Addition                   Code           CountingDots 
##                  0.200                  0.586                  0.494 
## StraightCurvedCapitals        WordRecognition      NumberRecognition 
##                  0.569                  0.838                  0.848 
##      FigureRecognition           ObjectNumber           NumberFigure 
##                  0.643                  0.780                  0.635 
##             FigureWord              Deduction       NumericalPuzzles 
##                  0.788                  0.590                  0.580 
##       ProblemReasoning       SeriesCompletion     ArithmeticProblems 
##                  0.597                  0.498                  0.500 
## 
## Loadings:
##                        Factor1 Factor2 Factor3
## VisualPerception        0.176   0.656   0.198 
## Cubes                   0.122   0.428         
## PaperFormBoard          0.145   0.563         
## Flags                   0.239   0.487   0.107 
## GeneralInformation      0.745   0.191   0.237 
## PargraphComprehension   0.780   0.249   0.118 
## SentenceCompletion      0.802   0.175   0.160 
## WordClassification      0.571   0.327   0.256 
## WordMeaning             0.821   0.248         
## Addition                0.162  -0.118   0.871 
## Code                    0.198   0.219   0.572 
## CountingDots                    0.179   0.688 
## StraightCurvedCapitals  0.190   0.381   0.499 
## WordRecognition         0.231   0.253   0.210 
## NumberRecognition       0.158   0.299   0.195 
## FigureRecognition       0.108   0.557   0.186 
## ObjectNumber            0.178   0.267   0.342 
## NumberFigure                    0.427   0.424 
## FigureWord              0.167   0.355   0.240 
## Deduction               0.392   0.472   0.181 
## NumericalPuzzles        0.178   0.406   0.473 
## ProblemReasoning        0.382   0.473   0.182 
## SeriesCompletion        0.379   0.528   0.283 
## ArithmeticProblems      0.377   0.226   0.554 
## 
##                Factor1 Factor2 Factor3
## SS loadings      3.802   3.488   3.186
## Proportion Var   0.158   0.145   0.133
## Cumulative Var   0.158   0.304   0.436
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 295.59 on 207 degrees of freedom.
## The p-value is 5.12e-05 
## 
## Call:
## factanal(factors = factors, covmat = Harman74.cor)
## 
## Uniquenesses:
##       VisualPerception                  Cubes         PaperFormBoard 
##                  0.438                  0.780                  0.644 
##                  Flags     GeneralInformation  PargraphComprehension 
##                  0.651                  0.352                  0.312 
##     SentenceCompletion     WordClassification            WordMeaning 
##                  0.283                  0.485                  0.257 
##               Addition                   Code           CountingDots 
##                  0.240                  0.551                  0.435 
## StraightCurvedCapitals        WordRecognition      NumberRecognition 
##                  0.491                  0.646                  0.696 
##      FigureRecognition           ObjectNumber           NumberFigure 
##                  0.549                  0.598                  0.593 
##             FigureWord              Deduction       NumericalPuzzles 
##                  0.762                  0.592                  0.583 
##       ProblemReasoning       SeriesCompletion     ArithmeticProblems 
##                  0.601                  0.497                  0.500 
## 
## Loadings:
##                        Factor1 Factor2 Factor3 Factor4
## VisualPerception        0.160   0.689   0.187   0.160 
## Cubes                   0.117   0.436                 
## PaperFormBoard          0.137   0.570           0.110 
## Flags                   0.233   0.527                 
## GeneralInformation      0.739   0.185   0.213   0.150 
## PargraphComprehension   0.767   0.205           0.233 
## SentenceCompletion      0.806   0.197   0.153         
## WordClassification      0.569   0.339   0.242   0.132 
## WordMeaning             0.806   0.201           0.227 
## Addition                0.167  -0.118   0.831   0.166 
## Code                    0.180   0.120   0.512   0.374 
## CountingDots                    0.210   0.716         
## StraightCurvedCapitals  0.188   0.438   0.525         
## WordRecognition         0.197                   0.553 
## NumberRecognition       0.122   0.116           0.520 
## FigureRecognition               0.408           0.525 
## ObjectNumber            0.142           0.219   0.574 
## NumberFigure                    0.293   0.336   0.456 
## FigureWord              0.148   0.239   0.161   0.365 
## Deduction               0.378   0.402   0.118   0.301 
## NumericalPuzzles        0.175   0.381   0.438   0.223 
## ProblemReasoning        0.366   0.399   0.123   0.301 
## SeriesCompletion        0.369   0.500   0.244   0.239 
## ArithmeticProblems      0.370   0.158   0.496   0.304 
## 
##                Factor1 Factor2 Factor3 Factor4
## SS loadings      3.647   2.872   2.657   2.290
## Proportion Var   0.152   0.120   0.111   0.095
## Cumulative Var   0.152   0.272   0.382   0.478
## 
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 226.68 on 186 degrees of freedom.
## The p-value is 0.0224 
## 
## Call:
## factanal(factors = factors, covmat = Harman74.cor)
## 
## Uniquenesses:
##       VisualPerception                  Cubes         PaperFormBoard 
##                  0.450                  0.781                  0.639 
##                  Flags     GeneralInformation  PargraphComprehension 
##                  0.649                  0.357                  0.288 
##     SentenceCompletion     WordClassification            WordMeaning 
##                  0.277                  0.485                  0.262 
##               Addition                   Code           CountingDots 
##                  0.215                  0.386                  0.444 
## StraightCurvedCapitals        WordRecognition      NumberRecognition 
##                  0.256                  0.639                  0.706 
##      FigureRecognition           ObjectNumber           NumberFigure 
##                  0.550                  0.614                  0.596 
##             FigureWord              Deduction       NumericalPuzzles 
##                  0.764                  0.521                  0.564 
##       ProblemReasoning       SeriesCompletion     ArithmeticProblems 
##                  0.580                  0.442                  0.478 
## 
## Loadings:
##                        Factor1 Factor2 Factor3 Factor4 Factor5
## VisualPerception        0.161   0.658   0.136   0.182   0.199 
## Cubes                   0.113   0.435           0.107         
## PaperFormBoard          0.135   0.562           0.107   0.116 
## Flags                   0.231   0.533                         
## GeneralInformation      0.736   0.188   0.192   0.162         
## PargraphComprehension   0.775   0.187           0.251   0.113 
## SentenceCompletion      0.809   0.208   0.136                 
## WordClassification      0.568   0.348   0.223   0.131         
## WordMeaning             0.800   0.215           0.224         
## Addition                0.175  -0.100   0.844   0.176         
## Code                    0.185           0.438   0.451   0.426 
## CountingDots                    0.222   0.690   0.101   0.140 
## StraightCurvedCapitals  0.186   0.425   0.458           0.559 
## WordRecognition         0.197                   0.557         
## NumberRecognition       0.121   0.130           0.508         
## FigureRecognition               0.400           0.529         
## ObjectNumber            0.145           0.208   0.562         
## NumberFigure                    0.306   0.325   0.452         
## FigureWord              0.147   0.242   0.145   0.364         
## Deduction               0.370   0.452   0.139   0.287  -0.190 
## NumericalPuzzles        0.170   0.402   0.439   0.230         
## ProblemReasoning        0.358   0.423   0.126   0.302         
## SeriesCompletion        0.360   0.549   0.256   0.223  -0.107 
## ArithmeticProblems      0.371   0.185   0.502   0.307         
## 
##                Factor1 Factor2 Factor3 Factor4 Factor5
## SS loadings      3.632   2.964   2.456   2.345   0.663
## Proportion Var   0.151   0.124   0.102   0.098   0.028
## Cumulative Var   0.151   0.275   0.377   0.475   0.503
## 
## Test of the hypothesis that 5 factors are sufficient.
## The chi square statistic is 186.82 on 166 degrees of freedom.
## The p-value is 0.128 
## 
## Hrm74.> Harman74.FA <- factanal(factors = 5, covmat = Harman74.cor,
## Hrm74.+                         rotation = "promax")
## 
## Hrm74.> print(Harman74.FA$loadings, sort = TRUE)
## 
## Loadings:
##                        Factor1 Factor2 Factor3 Factor4 Factor5
## VisualPerception        0.831          -0.127           0.230 
## Cubes                   0.534                                 
## PaperFormBoard          0.736          -0.290           0.136 
## Flags                   0.647                  -0.104         
## SeriesCompletion        0.555   0.126   0.127                 
## GeneralInformation              0.764                         
## PargraphComprehension           0.845  -0.140   0.140         
## SentenceCompletion              0.872          -0.140         
## WordClassification      0.277   0.505   0.104                 
## WordMeaning                     0.846  -0.108                 
## Addition               -0.334           1.012                 
## CountingDots            0.206  -0.200   0.722           0.185 
## ArithmeticProblems              0.197   0.500   0.139         
## WordRecognition        -0.126   0.127  -0.103   0.657         
## NumberRecognition                               0.568         
## FigureRecognition       0.399  -0.142  -0.207   0.562         
## ObjectNumber           -0.108           0.107   0.613         
## StraightCurvedCapitals  0.542           0.247           0.618 
## Code                            0.112   0.288   0.486   0.424 
## NumberFigure            0.255  -0.230   0.211   0.413         
## FigureWord              0.187                   0.347         
## Deduction               0.404   0.169           0.117  -0.203 
## NumericalPuzzles        0.393           0.368                 
## ProblemReasoning        0.381   0.188           0.169         
## 
##                Factor1 Factor2 Factor3 Factor4 Factor5
## SS loadings      3.529   3.311   2.367   2.109   0.762
## Proportion Var   0.147   0.138   0.099   0.088   0.032
## Cumulative Var   0.147   0.285   0.384   0.471   0.503

TOP

Indometh Pharmacokinetics of Indomethacin

Description
The Indometh data frame has 66 rows and 3 columns of data on the pharmacokinetics of indometacin (or, older spelling, ‘indomethacin’).

Usage
Indometh
Format
An object of class c("nfnGroupedData", "nfGroupedData", "groupedData", "data.frame") containing the following columns:

Subject
an ordered factor with containing the subject codes. The ordering is according to increasing maximum response.

time
a numeric vector of times at which blood samples were drawn (hr).

conc
a numeric vector of plasma concentrations of indometacin (mcg/ml).

Details
Each of the six subjects were given an intravenous injection of indometacin.

This dataset was originally part of package nlme, and that has methods (including for [, as.data.frame, plot and print) for its grouped-data classes.

Source
Kwan, Breault, Umbenhauer, McMahon and Duggan (1976) Kinetics of Indomethacin absorption, elimination, and enterohepatic circulation in man. Journal of Pharmacokinetics and Biopharmaceutics 4, 255–280.

Davidian, M. and Giltinan, D. M. (1995) Nonlinear Models for Repeated Measurement Data, Chapman & Hall (section 5.2.4, p. 129)

Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS, Springer.

See Also
SSbiexp for models fitted to this dataset.

head(Indometh)

TOP

InsectSprays Effectiveness of Insect Sprays

Description
The counts of insects in agricultural experimental units treated with different insecticides.

Usage
InsectSprays
Format
A data frame with 72 observations on 2 variables.

[,1]    count   numeric Insect count
[,2]    spray   factor  The type of spray
Source
Beall, G., (1942) The Transformation of data from entomological field experiments, Biometrika, 29, 243–262.

References
McNeil, D. (1977) Interactive Data Analysis. New York: Wiley.

Examples
require(stats); require(graphics)
boxplot(count ~ spray, data = InsectSprays,
        xlab = "Type of spray", ylab = "Insect count",
        main = "InsectSprays data", varwidth = TRUE, col = "lightgray")
fm1 <- aov(count ~ spray, data = InsectSprays)
summary(fm1)
opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0))
plot(fm1)
fm2 <- aov(sqrt(count) ~ spray, data = InsectSprays)
summary(fm2)
plot(fm2)
par(opar)

head(InsectSprays)

example(InsectSprays)

## 
## InsctS> require(stats); require(graphics)
## 
## InsctS> boxplot(count ~ spray, data = InsectSprays,
## InsctS+         xlab = "Type of spray", ylab = "Insect count",
## InsctS+         main = "InsectSprays data", varwidth = TRUE, col = "lightgray")

## 
## InsctS> fm1 <- aov(count ~ spray, data = InsectSprays)
## 
## InsctS> summary(fm1)
##             Df Sum Sq Mean Sq F value Pr(>F)    
## spray        5   2669   533.8    34.7 <2e-16 ***
## Residuals   66   1015    15.4                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## InsctS> opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0))
## 
## InsctS> plot(fm1)

## 
## InsctS> fm2 <- aov(sqrt(count) ~ spray, data = InsectSprays)
## 
## InsctS> summary(fm2)
##             Df Sum Sq Mean Sq F value Pr(>F)    
## spray        5  88.44  17.688    44.8 <2e-16 ***
## Residuals   66  26.06   0.395                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## InsctS> plot(fm2)

## 
## InsctS> par(opar)

TOP

JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share

Description
Quarterly earnings (dollars) per Johnson & Johnson share 1960–80.

Usage
JohnsonJohnson
Format
A quarterly time series

Source
Shumway, R. H. and Stoffer, D. S. (2000) Time Series Analysis and its Applications. Second Edition. Springer. Example 1.1.

Examples
require(stats); require(graphics)
JJ <- log10(JohnsonJohnson)
plot(JJ)
## This example gives a possible-non-convergence warning on some
## platforms, but does seem to converge on x86 Linux and Windows.
(fit <- StructTS(JJ, type = "BSM"))
tsdiag(fit)
sm <- tsSmooth(fit)
plot(cbind(JJ, sm[, 1], sm[, 3]-0.5), plot.type = "single",
     col = c("black", "green", "blue"))
abline(h = -0.5, col = "grey60")

monthplot(fit)

head(JohnsonJohnson)

## [1] 0.71 0.63 0.85 0.44 0.61 0.69

example(JohnsonJohnson)

## 
## JhnsnJ> ## No test: 
## JhnsnJ> ##D require(stats); require(graphics)
## JhnsnJ> ##D JJ <- log10(JohnsonJohnson)
## JhnsnJ> ##D plot(JJ)
## JhnsnJ> ##D ## This example gives a possible-non-convergence warning on some
## JhnsnJ> ##D ## platforms, but does seem to converge on x86 Linux and Windows.
## JhnsnJ> ##D (fit <- StructTS(JJ, type = "BSM"))
## JhnsnJ> ##D tsdiag(fit)
## JhnsnJ> ##D sm <- tsSmooth(fit)
## JhnsnJ> ##D plot(cbind(JJ, sm[, 1], sm[, 3]-0.5), plot.type = "single",
## JhnsnJ> ##D      col = c("black", "green", "blue"))
## JhnsnJ> ##D abline(h = -0.5, col = "grey60")
## JhnsnJ> ##D 
## JhnsnJ> ##D monthplot(fit)
## JhnsnJ> ## End(No test)
## JhnsnJ> 
## JhnsnJ>

TOP

LakeHuron Level of Lake Huron 1875-1972

Description
Annual measurements of the level, in feet, of Lake Huron 1875–1972.

Usage
LakeHuron
Format
A time series of length 98.

Source
Brockwell, P. J. and Davis, R. A. (1991). Time Series and Forecasting Methods. Second edition. Springer, New York. Series A, page 555.

Brockwell, P. J. and Davis, R. A. (1996). Introduction to Time Series and Forecasting. Springer, New York. Sections 5.1 and 7.6.

head(LakeHuron)

## [1] 580.38 581.86 580.97 580.80 579.79 580.39

TOP

LifeCycleSavings Intercountry Life-Cycle Savings Data

Description
Data on the savings ratio 1960–1970.

Usage
LifeCycleSavings
Format
A data frame with 50 observations on 5 variables.

[,1]    sr  numeric aggregate personal savings
[,2]    pop15   numeric % of population under 15
[,3]    pop75   numeric % of population over 75
[,4]    dpi numeric real per-capita disposable income
[,5]    ddpi    numeric % growth rate of dpi
Details
Under the life-cycle savings hypothesis as developed by Franco Modigliani, the savings ratio (aggregate personal saving divided by disposable income) is explained by per-capita disposable income, the percentage rate of change in per-capita disposable income, and two demographic variables: the percentage of population less than 15 years old and the percentage of the population over 75 years old. The data are averaged over the decade 1960–1970 to remove the business cycle or other short-term fluctuations.

Source
The data were obtained from Belsley, Kuh and Welsch (1980). They in turn obtained the data from Sterling (1977).

References
Sterling, Arnie (1977) Unpublished BS Thesis. Massachusetts Institute of Technology.

Belsley, D. A., Kuh. E. and Welsch, R. E. (1980) Regression Diagnostics. New York: Wiley.

Examples
require(stats); require(graphics)
pairs(LifeCycleSavings, panel = panel.smooth,
      main = "LifeCycleSavings data")
fm1 <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
summary(fm1)

head(LifeCycleSavings)

example(LifeCycleSavings)

## 
## LfCycS> require(stats); require(graphics)
## 
## LfCycS> pairs(LifeCycleSavings, panel = panel.smooth,
## LfCycS+       main = "LifeCycleSavings data")

## 
## LfCycS> fm1 <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
## 
## LfCycS> summary(fm1)
## 
## Call:
## lm(formula = sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2422 -2.6857 -0.2488  2.4280  9.7509 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28.5660865  7.3545161   3.884 0.000334 ***
## pop15       -0.4611931  0.1446422  -3.189 0.002603 ** 
## pop75       -1.6914977  1.0835989  -1.561 0.125530    
## dpi         -0.0003369  0.0009311  -0.362 0.719173    
## ddpi         0.4096949  0.1961971   2.088 0.042471 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.803 on 45 degrees of freedom
## Multiple R-squared:  0.3385, Adjusted R-squared:  0.2797 
## F-statistic: 5.756 on 4 and 45 DF,  p-value: 0.0007904

TOP

Loblolly Growth of Loblolly pine trees

Description
The Loblolly data frame has 84 rows and 3 columns of records of the growth of Loblolly pine trees.

Usage
Loblolly
Format
An object of class c("nfnGroupedData", "nfGroupedData", "groupedData", "data.frame") containing the following columns:

height
a numeric vector of tree heights (ft).

age
a numeric vector of tree ages (yr).

Seed
an ordered factor indicating the seed source for the tree. The ordering is according to increasing maximum height.

Details
This dataset was originally part of package nlme, and that has methods (including for [, as.data.frame, plot and print) for its grouped-data classes.

Source
Kung, F. H. (1986), Fitting logistic growth curve with predetermined carrying capacity, in Proceedings of the Statistical Computing Section, American Statistical Association, 340–343.

Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS, Springer.

Examples
require(stats); require(graphics)
plot(height ~ age, data = Loblolly, subset = Seed == 329,
     xlab = "Tree age (yr)", las = 1,
     ylab = "Tree height (ft)",
     main = "Loblolly data and fitted curve (Seed 329 only)")
fm1 <- nls(height ~ SSasymp(age, Asym, R0, lrc),
           data = Loblolly, subset = Seed == 329)
age <- seq(0, 30, length.out = 101)
lines(age, predict(fm1, list(age = age)))

head(Loblolly)

example(Loblolly)

## 
## Lbllly> require(stats); require(graphics)
## 
## Lbllly> plot(height ~ age, data = Loblolly, subset = Seed == 329,
## Lbllly+      xlab = "Tree age (yr)", las = 1,
## Lbllly+      ylab = "Tree height (ft)",
## Lbllly+      main = "Loblolly data and fitted curve (Seed 329 only)")

## 
## Lbllly> fm1 <- nls(height ~ SSasymp(age, Asym, R0, lrc),
## Lbllly+            data = Loblolly, subset = Seed == 329)
## 
## Lbllly> age <- seq(0, 30, length.out = 101)
## 
## Lbllly> lines(age, predict(fm1, list(age = age)))

TOP

Nile Flow of the River Nile

Description
Measurements of the annual flow of the river Nile at Aswan (formerly Assuan), 1871–1970, in 10^8 m^3, “with apparent changepoint near 1898” (Cobb(1978), Table 1, p.249).

Usage
Nile
Format
A time series of length 100.

Source
Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press. http://www.ssfpack.com/DKbook.html

References
Balke, N. S. (1993). Detecting level shifts in time series. Journal of Business and Economic Statistics, 11, 81–92. doi: 10.2307/1391308.

Cobb, G. W. (1978). The problem of the Nile: conditional solution to a change-point problem. Biometrika 65, 243–51. doi: 10.2307/2335202.

Examples
require(stats); require(graphics)
par(mfrow = c(2, 2))
plot(Nile)
acf(Nile)
pacf(Nile)
ar(Nile) # selects order 2
cpgram(ar(Nile)$resid)
par(mfrow = c(1, 1))
arima(Nile, c(2, 0, 0))

## Now consider missing values, following Durbin & Koopman
NileNA <- Nile
NileNA[c(21:40, 61:80)] <- NA
arima(NileNA, c(2, 0, 0))
plot(NileNA)
pred <-
   predict(arima(window(NileNA, 1871, 1890), c(2, 0, 0)), n.ahead = 20)
lines(pred$pred, lty = 3, col = "red")
lines(pred$pred + 2*pred$se, lty = 2, col = "blue")
lines(pred$pred - 2*pred$se, lty = 2, col = "blue")
pred <-
   predict(arima(window(NileNA, 1871, 1930), c(2, 0, 0)), n.ahead = 20)
lines(pred$pred, lty = 3, col = "red")
lines(pred$pred + 2*pred$se, lty = 2, col = "blue")
lines(pred$pred - 2*pred$se, lty = 2, col = "blue")

## Structural time series models
par(mfrow = c(3, 1))
plot(Nile)
## local level model
(fit <- StructTS(Nile, type = "level"))
lines(fitted(fit), lty = 2)              # contemporaneous smoothing
lines(tsSmooth(fit), lty = 2, col = 4)   # fixed-interval smoothing
plot(residuals(fit)); abline(h = 0, lty = 3)
## local trend model
(fit2 <- StructTS(Nile, type = "trend")) ## constant trend fitted
pred <- predict(fit, n.ahead = 30)
## with 50% confidence interval
ts.plot(Nile, pred$pred,
        pred$pred + 0.67*pred$se, pred$pred -0.67*pred$se)

## Now consider missing values
plot(NileNA)
(fit3 <- StructTS(NileNA, type = "level"))
lines(fitted(fit3), lty = 2)
lines(tsSmooth(fit3), lty = 3)
plot(residuals(fit3)); abline(h = 0, lty = 3)

head(Nile)

## [1] 1120 1160  963 1210 1160 1160

example(Nile)

## 
## Nile> require(stats); require(graphics)
## 
## Nile> par(mfrow = c(2, 2))
## 
## Nile> plot(Nile)

## 
## Nile> acf(Nile)

## 
## Nile> pacf(Nile)

## 
## Nile> ar(Nile) # selects order 2
## 
## Call:
## ar(x = Nile)
## 
## Coefficients:
##      1       2  
## 0.4081  0.1812  
## 
## Order selected 2  sigma^2 estimated as  21247
## 
## Nile> cpgram(ar(Nile)$resid)

## 
## Nile> par(mfrow = c(1, 1))
## 
## Nile> arima(Nile, c(2, 0, 0))
## 
## Call:
## arima(x = Nile, order = c(2, 0, 0))
## 
## Coefficients:
##          ar1     ar2  intercept
##       0.4096  0.1987   919.8397
## s.e.  0.0974  0.0990    35.6410
## 
## sigma^2 estimated as 20291:  log likelihood = -637.98,  aic = 1283.96
## 
## Nile> ## Now consider missing values, following Durbin & Koopman
## Nile> NileNA <- Nile
## 
## Nile> NileNA[c(21:40, 61:80)] <- NA
## 
## Nile> arima(NileNA, c(2, 0, 0))
## 
## Call:
## arima(x = NileNA, order = c(2, 0, 0))
## 
## Coefficients:
##          ar1     ar2  intercept
##       0.3622  0.1678   918.3103
## s.e.  0.1273  0.1323    39.5037
## 
## sigma^2 estimated as 23676:  log likelihood = -387.7,  aic = 783.41
## 
## Nile> plot(NileNA)

## 
## Nile> pred <-
## Nile+    predict(arima(window(NileNA, 1871, 1890), c(2, 0, 0)), n.ahead = 20)
## 
## Nile> lines(pred$pred, lty = 3, col = "red")
## 
## Nile> lines(pred$pred + 2*pred$se, lty = 2, col = "blue")
## 
## Nile> lines(pred$pred - 2*pred$se, lty = 2, col = "blue")
## 
## Nile> pred <-
## Nile+    predict(arima(window(NileNA, 1871, 1930), c(2, 0, 0)), n.ahead = 20)
## 
## Nile> lines(pred$pred, lty = 3, col = "red")
## 
## Nile> lines(pred$pred + 2*pred$se, lty = 2, col = "blue")
## 
## Nile> lines(pred$pred - 2*pred$se, lty = 2, col = "blue")
## 
## Nile> ## Structural time series models
## Nile> par(mfrow = c(3, 1))
## 
## Nile> plot(Nile)

## 
## Nile> ## local level model
## Nile> (fit <- StructTS(Nile, type = "level"))
## 
## Call:
## StructTS(x = Nile, type = "level")
## 
## Variances:
##   level  epsilon  
##    1469    15099  
## 
## Nile> lines(fitted(fit), lty = 2)              # contemporaneous smoothing
## 
## Nile> lines(tsSmooth(fit), lty = 2, col = 4)   # fixed-interval smoothing
## 
## Nile> plot(residuals(fit)); abline(h = 0, lty = 3)

## 
## Nile> ## local trend model
## Nile> (fit2 <- StructTS(Nile, type = "trend")) ## constant trend fitted
## 
## Call:
## StructTS(x = Nile, type = "trend")
## 
## Variances:
##   level    slope  epsilon  
##    1427        0    15047  
## 
## Nile> pred <- predict(fit, n.ahead = 30)
## 
## Nile> ## with 50% confidence interval
## Nile> ts.plot(Nile, pred$pred,
## Nile+         pred$pred + 0.67*pred$se, pred$pred -0.67*pred$se)

## 
## Nile> ## Now consider missing values
## Nile> plot(NileNA)

## 
## Nile> (fit3 <- StructTS(NileNA, type = "level"))
## 
## Call:
## StructTS(x = NileNA, type = "level")
## 
## Variances:
##   level  epsilon  
##   685.8  17899.8  
## 
## Nile> lines(fitted(fit3), lty = 2)
## 
## Nile> lines(tsSmooth(fit3), lty = 3)
## 
## Nile> plot(residuals(fit3)); abline(h = 0, lty = 3)

TOP

Orange Growth of Orange Trees

Description
The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees.

Usage
Orange
Format
An object of class c("nfnGroupedData", "nfGroupedData", "groupedData", "data.frame") containing the following columns:

Tree
an ordered factor indicating the tree on which the measurement is made. The ordering is according to increasing maximum diameter.

age
a numeric vector giving the age of the tree (days since 1968/12/31)

circumference
a numeric vector of trunk circumferences (mm). This is probably “circumference at breast height”, a standard measurement in forestry.

Details
This dataset was originally part of package nlme, and that has methods (including for [, as.data.frame, plot and print) for its grouped-data classes.

Source
Draper, N. R. and Smith, H. (1998), Applied Regression Analysis (3rd ed), Wiley (exercise 24.N).

Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS, Springer.

Examples
require(stats); require(graphics)
coplot(circumference ~ age | Tree, data = Orange, show.given = FALSE)
fm1 <- nls(circumference ~ SSlogis(age, Asym, xmid, scal),
           data = Orange, subset = Tree == 3)
plot(circumference ~ age, data = Orange, subset = Tree == 3,
     xlab = "Tree age (days since 1968/12/31)",
     ylab = "Tree circumference (mm)", las = 1,
     main = "Orange tree data and fitted model (Tree 3 only)")
age <- seq(0, 1600, length.out = 101)
lines(age, predict(fm1, list(age = age)))

head(Orange)

example(Orange)

## 
## Orange> require(stats); require(graphics)
## 
## Orange> coplot(circumference ~ age | Tree, data = Orange, show.given = FALSE)

## 
## Orange> fm1 <- nls(circumference ~ SSlogis(age, Asym, xmid, scal),
## Orange+            data = Orange, subset = Tree == 3)
## 
## Orange> plot(circumference ~ age, data = Orange, subset = Tree == 3,
## Orange+      xlab = "Tree age (days since 1968/12/31)",
## Orange+      ylab = "Tree circumference (mm)", las = 1,
## Orange+      main = "Orange tree data and fitted model (Tree 3 only)")

## 
## Orange> age <- seq(0, 1600, length.out = 101)
## 
## Orange> lines(age, predict(fm1, list(age = age)))

TOP

OrchardSprays Potency of Orchard Sprays

Description
An experiment was conducted to assess the potency of various constituents of orchard sprays in repelling honeybees, using a Latin square design.

Usage
OrchardSprays
Format
A data frame with 64 observations on 4 variables.

[,1]    rowpos  numeric Row of the design
[,2]    colpos  numeric Column of the design
[,3]    treatment   factor  Treatment level
[,4]    decrease    numeric Response
Details
Individual cells of dry comb were filled with measured amounts of lime sulphur emulsion in sucrose solution. Seven different concentrations of lime sulphur ranging from a concentration of 1/100 to 1/1,562,500 in successive factors of 1/5 were used as well as a solution containing no lime sulphur.

The responses for the different solutions were obtained by releasing 100 bees into the chamber for two hours, and then measuring the decrease in volume of the solutions in the various cells.

An 8 x 8 Latin square design was used and the treatments were coded as follows:

A   highest level of lime sulphur
B   next highest level of lime sulphur
.   
.   
.   
G   lowest level of lime sulphur
H   no lime sulphur
Source
Finney, D. J. (1947) Probit Analysis. Cambridge.

References
McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

Examples
require(graphics)
pairs(OrchardSprays, main = "OrchardSprays data")

head(OrchardSprays)

example(OrchardSprays)

## 
## OrchrS> require(graphics)
## 
## OrchrS> pairs(OrchardSprays, main = "OrchardSprays data")

TOP

PlantGrowth Results from an Experiment on Plant Growth

Description
Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.

Usage
PlantGrowth
Format
A data frame of 30 cases on 2 variables.

[, 1]   weight  numeric
[, 2]   group   factor
The levels of group are ‘ctrl’, ‘trt1’, and ‘trt2’.

Source
Dobson, A. J. (1983) An Introduction to Statistical Modelling. London: Chapman and Hall.

Examples
## One factor ANOVA example from Dobson's book, cf. Table 7.4:
require(stats); require(graphics)
boxplot(weight ~ group, data = PlantGrowth, main = "PlantGrowth data",
        ylab = "Dried weight of plants", col = "lightgray",
        notch = TRUE, varwidth = TRUE)
anova(lm(weight ~ group, data = PlantGrowth))

head(PlantGrowth)

example(PlantGrowth)

## 
## PlntGr> ## One factor ANOVA example from Dobson's book, cf. Table 7.4:
## PlntGr> require(stats); require(graphics)
## 
## PlntGr> boxplot(weight ~ group, data = PlantGrowth, main = "PlantGrowth data",
## PlntGr+         ylab = "Dried weight of plants", col = "lightgray",
## PlntGr+         notch = TRUE, varwidth = TRUE)

## Warning in bxp(list(stats = structure(c(4.17, 4.53, 5.155, 5.33, 6.11, 3.59, :
## some notches went outside hinges ('box'): maybe set notch=FALSE

## 
## PlntGr> anova(lm(weight ~ group, data = PlantGrowth))
## Analysis of Variance Table
## 
## Response: weight
##           Df  Sum Sq Mean Sq F value  Pr(>F)  
## group      2  3.7663  1.8832  4.8461 0.01591 *
## Residuals 27 10.4921  0.3886                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TOP

Puromycin Reaction Velocity of an Enzymatic Reaction

Description
The Puromycin data frame has 23 rows and 3 columns of the reaction velocity versus substrate concentration in an enzymatic reaction involving untreated cells or cells treated with Puromycin.

Usage
Puromycin
Format
This data frame contains the following columns:

conc
a numeric vector of substrate concentrations (ppm)

rate
a numeric vector of instantaneous reaction rates (counts/min/min)

state
a factor with levels treated untreated

Details
Data on the velocity of an enzymatic reaction were obtained by Treloar (1974). The number of counts per minute of radioactive product from the reaction was measured as a function of substrate concentration in parts per million (ppm) and from these counts the initial rate (or velocity) of the reaction was calculated (counts/min/min). The experiment was conducted once with the enzyme treated with Puromycin, and once with the enzyme untreated.

Source
Bates, D.M. and Watts, D.G. (1988), Nonlinear Regression Analysis and Its Applications, Wiley, Appendix A1.3.

Treloar, M. A. (1974), Effects of Puromycin on Galactosyltransferase in Golgi Membranes, M.Sc. Thesis, U. of Toronto.

See Also
SSmicmen for other models fitted to this dataset.

Examples
require(stats); require(graphics)

plot(rate ~ conc, data = Puromycin, las = 1,
     xlab = "Substrate concentration (ppm)",
     ylab = "Reaction velocity (counts/min/min)",
     pch = as.integer(Puromycin$state),
     col = as.integer(Puromycin$state),
     main = "Puromycin data and fitted Michaelis-Menten curves")
## simplest form of fitting the Michaelis-Menten model to these data
fm1 <- nls(rate ~ Vm * conc/(K + conc), data = Puromycin,
           subset = state == "treated",
           start = c(Vm = 200, K = 0.05))
fm2 <- nls(rate ~ Vm * conc/(K + conc), data = Puromycin,
           subset = state == "untreated",
           start = c(Vm = 160, K = 0.05))
summary(fm1)
summary(fm2)
## add fitted lines to the plot
conc <- seq(0, 1.2, length.out = 101)
lines(conc, predict(fm1, list(conc = conc)), lty = 1, col = 1)
lines(conc, predict(fm2, list(conc = conc)), lty = 2, col = 2)
legend(0.8, 120, levels(Puromycin$state),
       col = 1:2, lty = 1:2, pch = 1:2)

## using partial linearity
fm3 <- nls(rate ~ conc/(K + conc), data = Puromycin,
           subset = state == "treated", start = c(K = 0.05),
           algorithm = "plinear")

head(Puromycin)

example(Puromycin)

## 
## Prmycn> require(stats); require(graphics)
## 
## Prmycn> ## Don't show: 
## Prmycn> options(show.nls.convergence=FALSE)
## 
## Prmycn> ## End(Don't show)
## Prmycn> plot(rate ~ conc, data = Puromycin, las = 1,
## Prmycn+      xlab = "Substrate concentration (ppm)",
## Prmycn+      ylab = "Reaction velocity (counts/min/min)",
## Prmycn+      pch = as.integer(Puromycin$state),
## Prmycn+      col = as.integer(Puromycin$state),
## Prmycn+      main = "Puromycin data and fitted Michaelis-Menten curves")

## 
## Prmycn> ## simplest form of fitting the Michaelis-Menten model to these data
## Prmycn> fm1 <- nls(rate ~ Vm * conc/(K + conc), data = Puromycin,
## Prmycn+            subset = state == "treated",
## Prmycn+            start = c(Vm = 200, K = 0.05))
## 
## Prmycn> fm2 <- nls(rate ~ Vm * conc/(K + conc), data = Puromycin,
## Prmycn+            subset = state == "untreated",
## Prmycn+            start = c(Vm = 160, K = 0.05))
## 
## Prmycn> summary(fm1)
## 
## Formula: rate ~ Vm * conc/(K + conc)
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## Vm 2.127e+02  6.947e+00  30.615 3.24e-11 ***
## K  6.412e-02  8.281e-03   7.743 1.57e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.93 on 10 degrees of freedom
## 
## 
## Prmycn> summary(fm2)
## 
## Formula: rate ~ Vm * conc/(K + conc)
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## Vm 1.603e+02  6.480e+00  24.734 1.38e-09 ***
## K  4.771e-02  7.782e-03   6.131 0.000173 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.773 on 9 degrees of freedom
## 
## 
## Prmycn> ## add fitted lines to the plot
## Prmycn> conc <- seq(0, 1.2, length.out = 101)
## 
## Prmycn> lines(conc, predict(fm1, list(conc = conc)), lty = 1, col = 1)
## 
## Prmycn> lines(conc, predict(fm2, list(conc = conc)), lty = 2, col = 2)
## 
## Prmycn> legend(0.8, 120, levels(Puromycin$state),
## Prmycn+        col = 1:2, lty = 1:2, pch = 1:2)
## 
## Prmycn> ## using partial linearity
## Prmycn> fm3 <- nls(rate ~ conc/(K + conc), data = Puromycin,
## Prmycn+            subset = state == "treated", start = c(K = 0.05),
## Prmycn+            algorithm = "plinear")

TOP

Theoph Pharmacokinetics of Theophylline

Description
The Theoph data frame has 132 rows and 5 columns of data from an experiment on the pharmacokinetics of theophylline.

Usage
Theoph
Format
An object of class c("nfnGroupedData", "nfGroupedData", "groupedData", "data.frame") containing the following columns:

Subject
an ordered factor with levels 1, ..., 12 identifying the subject on whom the observation was made. The ordering is by increasing maximum concentration of theophylline observed.

Wt
weight of the subject (kg).

Dose
dose of theophylline administered orally to the subject (mg/kg).

Time
time since drug administration when the sample was drawn (hr).

conc
theophylline concentration in the sample (mg/L).

Details
Boeckmann, Sheiner and Beal (1994) report data from a study by Dr. Robert Upton of the kinetics of the anti-asthmatic drug theophylline. Twelve subjects were given oral doses of theophylline then serum concentrations were measured at 11 time points over the next 25 hours.

These data are analyzed in Davidian and Giltinan (1995) and Pinheiro and Bates (2000) using a two-compartment open pharmacokinetic model, for which a self-starting model function, SSfol, is available.

This dataset was originally part of package nlme, and that has methods (including for [, as.data.frame, plot and print) for its grouped-data classes.

Source
Boeckmann, A. J., Sheiner, L. B. and Beal, S. L. (1994), NONMEM Users Guide: Part V, NONMEM Project Group, University of California, San Francisco.

Davidian, M. and Giltinan, D. M. (1995) Nonlinear Models for Repeated Measurement Data, Chapman & Hall (section 5.5, p. 145 and section 6.6, p. 176)

Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS, Springer (Appendix A.29)

See Also
SSfol

Examples
require(stats); require(graphics)

coplot(conc ~ Time | Subject, data = Theoph, show.given = FALSE)
Theoph.4 <- subset(Theoph, Subject == 4)
fm1 <- nls(conc ~ SSfol(Dose, Time, lKe, lKa, lCl),
           data = Theoph.4)
summary(fm1)
plot(conc ~ Time, data = Theoph.4,
     xlab = "Time since drug administration (hr)",
     ylab = "Theophylline concentration (mg/L)",
     main = "Observed concentrations and fitted model",
     sub  = "Theophylline data - Subject 4 only",
     las = 1, col = 4)
xvals <- seq(0, par("usr")[2], length.out = 55)
lines(xvals, predict(fm1, newdata = list(Time = xvals)),
      col = 4)

head(Theoph)

example(Theoph)

## 
## Theoph> require(stats); require(graphics)
## 
## Theoph> ## Don't show: 
## Theoph> options(show.nls.convergence=FALSE)
## 
## Theoph> ## End(Don't show)
## Theoph> coplot(conc ~ Time | Subject, data = Theoph, show.given = FALSE)

## 
## Theoph> Theoph.4 <- subset(Theoph, Subject == 4)
## 
## Theoph> fm1 <- nls(conc ~ SSfol(Dose, Time, lKe, lKa, lCl),
## Theoph+            data = Theoph.4)
## 
## Theoph> summary(fm1)
## 
## Formula: conc ~ SSfol(Dose, Time, lKe, lKa, lCl)
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## lKe  -2.4365     0.2257 -10.797 4.77e-06 ***
## lKa   0.1583     0.2297   0.689     0.51    
## lCl  -3.2861     0.1448 -22.695 1.51e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8465 on 8 degrees of freedom
## 
## 
## Theoph> plot(conc ~ Time, data = Theoph.4,
## Theoph+      xlab = "Time since drug administration (hr)",
## Theoph+      ylab = "Theophylline concentration (mg/L)",
## Theoph+      main = "Observed concentrations and fitted model",
## Theoph+      sub  = "Theophylline data - Subject 4 only",
## Theoph+      las = 1, col = 4)

## 
## Theoph> xvals <- seq(0, par("usr")[2], length.out = 55)
## 
## Theoph> lines(xvals, predict(fm1, newdata = list(Time = xvals)),
## Theoph+       col = 4)

TOP

Titanic Survival of passengers on the Titanic

Description
This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner ‘Titanic’, summarized according to economic status (class), sex, age and survival.

Usage
Titanic
Format
A 4-dimensional array resulting from cross-tabulating 2201 observations on 4 variables. The variables and their levels are as follows:

No  Name    Levels
1   Class   1st, 2nd, 3rd, Crew
2   Sex Male, Female
3   Age Child, Adult
4   Survived    No, Yes
Details
The sinking of the Titanic is a famous event, and new books are still being published about it. Many well-known facts—from the proportions of first-class passengers to the ‘women and children first’ policy, and the fact that that policy was not entirely successful in saving the women and children in the third class—are reflected in the survival rates for various classes of passenger.

These data were originally collected by the British Board of Trade in their investigation of the sinking. Note that there is not complete agreement among primary sources as to the exact numbers on board, rescued, or lost.

Due in particular to the very successful film ‘Titanic’, the last years saw a rise in public interest in the Titanic. Very detailed data about the passengers is now available on the Internet, at sites such as Encyclopedia Titanica (https://www.encyclopedia-titanica.org/).

Source
Dawson, Robert J. MacG. (1995), The ‘Unusual Episode’ Data Revisited. Journal of Statistics Education, 3. doi: 10.1080/10691898.1995.11910499.

The source provides a data set recording class, sex, age, and survival status for each person on board of the Titanic, and is based on data originally collected by the British Board of Trade and reprinted in:

British Board of Trade (1990), Report on the Loss of the ‘Titanic’ (S.S.). British Board of Trade Inquiry Report (reprint). Gloucester, UK: Allan Sutton Publishing.

Examples
require(graphics)
mosaicplot(Titanic, main = "Survival on the Titanic")
## Higher survival rates in children?
apply(Titanic, c(3, 4), sum)
## Higher survival rates in females?
apply(Titanic, c(2, 4), sum)
## Use loglm() in package 'MASS' for further analysis ...

head(Titanic)

## [1]  0  0 35  0  0  0

example(Titanic)

## 
## Titanc> require(graphics)
## 
## Titanc> mosaicplot(Titanic, main = "Survival on the Titanic")

## 
## Titanc> ## Higher survival rates in children?
## Titanc> apply(Titanic, c(3, 4), sum)
##        Survived
## Age       No Yes
##   Child   52  57
##   Adult 1438 654
## 
## Titanc> ## Higher survival rates in females?
## Titanc> apply(Titanic, c(2, 4), sum)
##         Survived
## Sex        No Yes
##   Male   1364 367
##   Female  126 344
## 
## Titanc> ## Use loglm() in package 'MASS' for further analysis ...
## Titanc> 
## Titanc> 
## Titanc>

TOP

ToothGrowth The Effect of Vitamin C on Tooth Growth in Guinea Pigs

Description
The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

Usage
ToothGrowth
Format
A data frame with 60 observations on 3 variables.

[,1]    len numeric Tooth length
[,2]    supp    factor  Supplement type (VC or OJ).
[,3]    dose    numeric Dose in milligrams/day
Source
C. I. Bliss (1952). The Statistics of Bioassay. Academic Press.

References
McNeil, D. R. (1977). Interactive Data Analysis. New York: Wiley.

Crampton, E. W. (1947). The growth of the odontoblast of the incisor teeth as a criterion of vitamin C intake of the guinea pig. The Journal of Nutrition, 33(5), 491–504. doi: 10.1093/jn/33.5.491.

Examples
require(graphics)
coplot(len ~ dose | supp, data = ToothGrowth, panel = panel.smooth,
       xlab = "ToothGrowth data: length vs dose, given type of supplement")

head(ToothGrowth)

example(ToothGrowth)

## 
## TthGrw> require(graphics)
## 
## TthGrw> coplot(len ~ dose | supp, data = ToothGrowth, panel = panel.smooth,
## TthGrw+        xlab = "ToothGrowth data: length vs dose, given type of supplement")

TOP

UCBAdmissions Student Admissions at UC Berkeley

Description
Aggregate data on applicants to graduate school at Berkeley for the six largest departments in 1973 classified by admission and sex.

Usage
UCBAdmissions
Format
A 3-dimensional array resulting from cross-tabulating 4526 observations on 3 variables. The variables and their levels are as follows:

No  Name    Levels
1   Admit   Admitted, Rejected
2   Gender  Male, Female
3   Dept    A, B, C, D, E, F
Details
This data set is frequently used for illustrating Simpson's paradox, see Bickel et al (1975). At issue is whether the data show evidence of sex bias in admission practices. There were 2691 male applicants, of whom 1198 (44.5%) were admitted, compared with 1835 female applicants of whom 557 (30.4%) were admitted. This gives a sample odds ratio of 1.83, indicating that males were almost twice as likely to be admitted. In fact, graphical methods (as in the example below) or log-linear modelling show that the apparent association between admission and sex stems from differences in the tendency of males and females to apply to the individual departments (females used to apply more to departments with higher rejection rates).

This data set can also be used for illustrating methods for graphical display of categorical data, such as the general-purpose mosaicplot or the fourfoldplot for 2-by-2-by-k tables.

References
Bickel, P. J., Hammel, E. A., and O'Connell, J. W. (1975). Sex bias in graduate admissions: Data from Berkeley. Science, 187, 398–403. http://www.jstor.org/stable/1739581.

Examples
require(graphics)
## Data aggregated over departments
apply(UCBAdmissions, c(1, 2), sum)
mosaicplot(apply(UCBAdmissions, c(1, 2), sum),
           main = "Student admissions at UC Berkeley")
## Data for individual departments
opar <- par(mfrow = c(2, 3), oma = c(0, 0, 2, 0))
for(i in 1:6)
  mosaicplot(UCBAdmissions[,,i],
    xlab = "Admit", ylab = "Sex",
    main = paste("Department", LETTERS[i]))
mtext(expression(bold("Student admissions at UC Berkeley")),
      outer = TRUE, cex = 1.5)
par(opar)

head(UCBAdmissions)

## [1] 512 313  89  19 353 207

example(UCBAdmissions)

## 
## UCBAdm> require(graphics)
## 
## UCBAdm> ## Data aggregated over departments
## UCBAdm> apply(UCBAdmissions, c(1, 2), sum)
##           Gender
## Admit      Male Female
##   Admitted 1198    557
##   Rejected 1493   1278
## 
## UCBAdm> mosaicplot(apply(UCBAdmissions, c(1, 2), sum),
## UCBAdm+            main = "Student admissions at UC Berkeley")

## 
## UCBAdm> ## Data for individual departments
## UCBAdm> opar <- par(mfrow = c(2, 3), oma = c(0, 0, 2, 0))
## 
## UCBAdm> for(i in 1:6)
## UCBAdm+   mosaicplot(UCBAdmissions[,,i],
## UCBAdm+     xlab = "Admit", ylab = "Sex",
## UCBAdm+     main = paste("Department", LETTERS[i]))

## 
## UCBAdm> mtext(expression(bold("Student admissions at UC Berkeley")),
## UCBAdm+       outer = TRUE, cex = 1.5)
## 
## UCBAdm> par(opar)

TOP

UKDriverDeaths Road Casualties in Great Britain 1969-84

Description
UKDriverDeaths is a time series giving the monthly totals of car drivers in Great Britain killed or seriously injured Jan 1969 to Dec 1984. Compulsory wearing of seat belts was introduced on 31 Jan 1983.

Seatbelts is more information on the same problem.

Usage
UKDriverDeaths
Seatbelts
Format
Seatbelts is a multiple time series, with columns

DriversKilled
car drivers killed.

drivers
same as UKDriverDeaths.

front
front-seat passengers killed or seriously injured.

rear
rear-seat passengers killed or seriously injured.

kms
distance driven.

PetrolPrice
petrol price.

VanKilled
number of van (‘light goods vehicle’) drivers.

law
0/1: was the law in effect that month?

Source
Harvey, A.C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, pp. 519–523.

Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press. http://www.ssfpack.com/dkbook/

References
Harvey, A. C. and Durbin, J. (1986). The effects of seat belt legislation on British road casualties: A case study in structural time series modelling. Journal of the Royal Statistical Society series A, 149, 187–227. doi: 10.2307/2981553.

Examples
require(stats); require(graphics)
## work with pre-seatbelt period to identify a model, use logs
work <- window(log10(UKDriverDeaths), end = 1982+11/12)
par(mfrow = c(3, 1))
plot(work); acf(work); pacf(work)
par(mfrow = c(1, 1))
(fit <- arima(work, c(1, 0, 0), seasonal = list(order = c(1, 0, 0))))
z <- predict(fit, n.ahead = 24)
ts.plot(log10(UKDriverDeaths), z$pred, z$pred+2*z$se, z$pred-2*z$se,
        lty = c(1, 3, 2, 2), col = c("black", "red", "blue", "blue"))

## now see the effect of the explanatory variables
X <- Seatbelts[, c("kms", "PetrolPrice", "law")]
X[, 1] <- log10(X[, 1]) - 4
arima(log10(Seatbelts[, "drivers"]), c(1, 0, 0),
      seasonal = list(order = c(1, 0, 0)), xreg = X)

head(UKDriverDeaths)

## [1] 1687 1508 1507 1385 1632 1511

example(UKDriverDeaths)

## 
## UKDrvD> require(stats); require(graphics)
## 
## UKDrvD> ## work with pre-seatbelt period to identify a model, use logs
## UKDrvD> work <- window(log10(UKDriverDeaths), end = 1982+11/12)
## 
## UKDrvD> par(mfrow = c(3, 1))
## 
## UKDrvD> plot(work); acf(work); pacf(work)

## 
## UKDrvD> par(mfrow = c(1, 1))
## 
## UKDrvD> (fit <- arima(work, c(1, 0, 0), seasonal = list(order = c(1, 0, 0))))
## 
## Call:
## arima(x = work, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 0)))
## 
## Coefficients:
##          ar1    sar1  intercept
##       0.4378  0.6281     3.2274
## s.e.  0.0764  0.0637     0.0131
## 
## sigma^2 estimated as 0.00157:  log likelihood = 300.85,  aic = -593.7
## 
## UKDrvD> z <- predict(fit, n.ahead = 24)
## 
## UKDrvD> ts.plot(log10(UKDriverDeaths), z$pred, z$pred+2*z$se, z$pred-2*z$se,
## UKDrvD+         lty = c(1, 3, 2, 2), col = c("black", "red", "blue", "blue"))

## 
## UKDrvD> ## now see the effect of the explanatory variables
## UKDrvD> X <- Seatbelts[, c("kms", "PetrolPrice", "law")]
## 
## UKDrvD> X[, 1] <- log10(X[, 1]) - 4
## 
## UKDrvD> arima(log10(Seatbelts[, "drivers"]), c(1, 0, 0),
## UKDrvD+       seasonal = list(order = c(1, 0, 0)), xreg = X)
## 
## Call:
## arima(x = log10(Seatbelts[, "drivers"]), order = c(1, 0, 0), seasonal = list(order = c(1, 
##     0, 0)), xreg = X)
## 
## Coefficients:
##          ar1    sar1  intercept     kms  PetrolPrice      law
##       0.3348  0.6672     3.3539  0.0082      -1.2224  -0.0963
## s.e.  0.0775  0.0612     0.0441  0.0902       0.3839   0.0166
## 
## sigma^2 estimated as 0.001476:  log likelihood = 349.73,  aic = -685.46

TOP

UKLungDeaths Monthly Deaths from Lung Diseases in the UK

Description
Three time series giving the monthly deaths from bronchitis, emphysema and asthma in the UK, 1974–1979, both sexes (ldeaths), males (mdeaths) and females (fdeaths).

Usage
ldeaths
fdeaths
mdeaths
Source
P. J. Diggle (1990) Time Series: A Biostatistical Introduction. Oxford, table A.3

Examples
require(stats); require(graphics) # for time
plot(ldeaths)
plot(mdeaths, fdeaths)
## Better labels:
yr <- floor(tt <- time(mdeaths))
plot(mdeaths, fdeaths,
     xy.labels = paste(month.abb[12*(tt - yr)], yr-1900, sep = "'"))

head(ldeaths)

## [1] 3035 2552 2704 2554 2014 1655

head(fdeaths)

## [1] 901 689 827 677 522 406

head(mdeaths)

## [1] 2134 1863 1877 1877 1492 1249

example(UKLungDeaths)

## 
## UKLngD> require(stats); require(graphics) # for time
## 
## UKLngD> plot(ldeaths)

## 
## UKLngD> plot(mdeaths, fdeaths)

## 
## UKLngD> ## Better labels:
## UKLngD> yr <- floor(tt <- time(mdeaths))
## 
## UKLngD> plot(mdeaths, fdeaths,
## UKLngD+      xy.labels = paste(month.abb[12*(tt - yr)], yr-1900, sep = "'"))

TOP

UKgas UK Quarterly Gas Consumption

Description
Quarterly UK gas consumption from 1960Q1 to 1986Q4, in millions of therms.

Usage
UKgas
Format
A quarterly time series of length 108.

Source
Durbin, J. and Koopman, S. J. (2001) Time Series Analysis by State Space Methods. Oxford University Press. http://www.ssfpack.com/dkbook/

Examples
## maybe str(UKgas) ; plot(UKgas) ...

head(UKgas)

## [1] 160.1 129.7  84.8 120.1 160.1 124.9

example(UKgas)

## 
## UKgas> ## maybe str(UKgas) ; plot(UKgas) ...
## UKgas> 
## UKgas> 
## UKgas>

TOP

USAccDeaths Accidental Deaths in the US 1973-1978

Description
A time series giving the monthly totals of accidental deaths in the USA. The values for the first six months of 1979 are 7798 7406 8363 8460 9217 9316.

Usage
USAccDeaths
Source
P. J. Brockwell and R. A. Davis (1991) Time Series: Theory and Methods. Springer, New York.

head(USAccDeaths)

## [1]  9007  8106  8928  9137 10017 10826

TOP

USArrests Violent Crime Rates by US State

Description
This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.

Usage
USArrests
Format
A data frame with 50 observations on 4 variables.

[,1]    Murder  numeric Murder arrests (per 100,000)
[,2]    Assault numeric Assault arrests (per 100,000)
[,3]    UrbanPop    numeric Percent urban population
[,4]    Rape    numeric Rape arrests (per 100,000)
Note
USArrests contains the data as in McNeil's monograph. For the UrbanPop percentages, a review of the table (No. 21) in the Statistical Abstracts 1975 reveals a transcription error for Maryland (and that McNeil used the same “round to even” rule that R's round() uses), as found by Daniel S Coven (Arizona).

See the example below on how to correct the error and improve accuracy for the ‘<n>.5’ percentages.

Source
World Almanac and Book of facts 1975. (Crime rates).

Statistical Abstracts of the United States 1975, p.20, (Urban rates), possibly available as https://books.google.ch/books?id=zl9qAAAAMAAJ&pg=PA20.

References
McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

See Also
The state data sets.

Examples
summary(USArrests)

require(graphics)
pairs(USArrests, panel = panel.smooth, main = "USArrests data")

## Difference between 'USArrests' and its correction
USArrests["Maryland", "UrbanPop"] # 67 -- the transcription error
UA.C <- USArrests
UA.C["Maryland", "UrbanPop"] <- 76.6

## also +/- 0.5 to restore the original  <n>.5  percentages
s5u <- c("Colorado", "Florida", "Mississippi", "Wyoming")
s5d <- c("Nebraska", "Pennsylvania")
UA.C[s5u, "UrbanPop"] <- UA.C[s5u, "UrbanPop"] + 0.5
UA.C[s5d, "UrbanPop"] <- UA.C[s5d, "UrbanPop"] - 0.5

## ==> UA.C  is now a *C*orrected version of  USArrests

head(USArrests)

example(USArrests)

## 
## USArrs> summary(USArrests)
##      Murder          Assault         UrbanPop          Rape      
##  Min.   : 0.800   Min.   : 45.0   Min.   :32.00   Min.   : 7.30  
##  1st Qu.: 4.075   1st Qu.:109.0   1st Qu.:54.50   1st Qu.:15.07  
##  Median : 7.250   Median :159.0   Median :66.00   Median :20.10  
##  Mean   : 7.788   Mean   :170.8   Mean   :65.54   Mean   :21.23  
##  3rd Qu.:11.250   3rd Qu.:249.0   3rd Qu.:77.75   3rd Qu.:26.18  
##  Max.   :17.400   Max.   :337.0   Max.   :91.00   Max.   :46.00  
## 
## USArrs> require(graphics)
## 
## USArrs> pairs(USArrests, panel = panel.smooth, main = "USArrests data")

## 
## USArrs> ## Difference between 'USArrests' and its correction
## USArrs> USArrests["Maryland", "UrbanPop"] # 67 -- the transcription error
## [1] 67
## 
## USArrs> UA.C <- USArrests
## 
## USArrs> UA.C["Maryland", "UrbanPop"] <- 76.6
## 
## USArrs> ## also +/- 0.5 to restore the original  <n>.5  percentages
## USArrs> s5u <- c("Colorado", "Florida", "Mississippi", "Wyoming")
## 
## USArrs> s5d <- c("Nebraska", "Pennsylvania")
## 
## USArrs> UA.C[s5u, "UrbanPop"] <- UA.C[s5u, "UrbanPop"] + 0.5
## 
## USArrs> UA.C[s5d, "UrbanPop"] <- UA.C[s5d, "UrbanPop"] - 0.5
## 
## USArrs> ## ==> UA.C  is now a *C*orrected version of  USArrests
## USArrs> 
## USArrs> 
## USArrs>

TOP

USJudgeRatings Lawyers’ Ratings of State Judges in the US Superior Court

Description
Lawyers' ratings of state judges in the US Superior Court.

Usage
USJudgeRatings
Format
A data frame containing 43 observations on 12 numeric variables.

[,1]    CONT    Number of contacts of lawyer with judge.
[,2]    INTG    Judicial integrity.
[,3]    DMNR    Demeanor.
[,4]    DILG    Diligence.
[,5]    CFMG    Case flow managing.
[,6]    DECI    Prompt decisions.
[,7]    PREP    Preparation for trial.
[,8]    FAMI    Familiarity with law.
[,9]    ORAL    Sound oral rulings.
[,10]   WRIT    Sound written rulings.
[,11]   PHYS    Physical ability.
[,12]   RTEN    Worthy of retention.
Source
New Haven Register, 14 January, 1977 (from John Hartigan).

Examples
require(graphics)
pairs(USJudgeRatings, main = "USJudgeRatings data")

head(USJudgeRatings)

example(USJudgeRatings)

## 
## USJdgR> require(graphics)
## 
## USJdgR> pairs(USJudgeRatings, main = "USJudgeRatings data")

TOP

USPersonalExpenditure Personal Expenditure Data

Description
This data set consists of United States personal expenditures (in billions of dollars) in the categories; food and tobacco, household operation, medical and health, personal care, and private education for the years 1940, 1945, 1950, 1955 and 1960.

Usage
USPersonalExpenditure
Format
A matrix with 5 rows and 5 columns.

Source
The World Almanac and Book of Facts, 1962, page 756.

References
Tukey, J. W. (1977) Exploratory Data Analysis. Addison-Wesley.

McNeil, D. R. (1977) Interactive Data Analysis. Wiley.

Examples
require(stats) # for medpolish
USPersonalExpenditure
medpolish(log10(USPersonalExpenditure))

head(USPersonalExpenditure)

##                       1940   1945  1950 1955  1960
## Food and Tobacco    22.200 44.500 59.60 73.2 86.80
## Household Operation 10.500 15.500 29.00 36.5 46.20
## Medical and Health   3.530  5.760  9.71 14.0 21.10
## Personal Care        1.040  1.980  2.45  3.4  5.40
## Private Education    0.341  0.974  1.80  2.6  3.64

example(USPersonalExpenditure)

## 
## USPrsE> require(stats) # for medpolish
## 
## USPrsE> USPersonalExpenditure
##                       1940   1945  1950 1955  1960
## Food and Tobacco    22.200 44.500 59.60 73.2 86.80
## Household Operation 10.500 15.500 29.00 36.5 46.20
## Medical and Health   3.530  5.760  9.71 14.0 21.10
## Personal Care        1.040  1.980  2.45  3.4  5.40
## Private Education    0.341  0.974  1.80  2.6  3.64
## 
## USPrsE> medpolish(log10(USPersonalExpenditure))
## 1: 1.126317
## 2: 1.032421
## Final: 1.032421
## 
## Median Polish Results (Dataset: "log10(USPersonalExpenditure)")
## 
## Overall: 0.9872192
## 
## Row Effects:
##    Food and Tobacco Household Operation  Medical and Health       Personal Care 
##           0.7880270           0.4327608           0.0000000          -0.5606543 
##   Private Education 
##          -0.7319467 
## 
## Column Effects:
##       1940       1945       1950       1955       1960 
## -0.4288933 -0.2267967  0.0000000  0.1423128  0.3058289 
## 
## Residuals:
##                          1940       1945      1950      1955      1960
## Food and Tobacco     0.000000  0.0999105  0.000000 -0.053048 -0.142555
## Household Operation  0.030103 -0.0028516  0.042418  0.000000 -0.061167
## Medical and Health  -0.010551  0.0000000  0.000000  0.016596  0.031234
## Personal Care        0.019362  0.0968971 -0.037399 -0.037399  0.000000
## Private Education   -0.293625 -0.0399168  0.000000  0.017388  0.000000

TOP

VADeaths Death Rates in Virginia (1940)

Description
Death rates per 1000 in Virginia in 1940.

Usage
VADeaths
Format
A matrix with 5 rows and 4 columns.

Details
The death rates are measured per 1000 population per year. They are cross-classified by age group (rows) and population group (columns). The age groups are: 50–54, 55–59, 60–64, 65–69, 70–74 and the population groups are Rural/Male, Rural/Female, Urban/Male and Urban/Female.

This provides a rather nice 3-way analysis of variance example.

Source
Molyneaux, L., Gilliam, S. K., and Florant, L. C.(1947) Differences in Virginia death rates by color, sex, age, and rural or urban residence. American Sociological Review, 12, 525–535.

References
McNeil, D. R. (1977) Interactive Data Analysis. Wiley.

Examples
require(stats); require(graphics)
n <- length(dr <- c(VADeaths))
nam <- names(VADeaths)
d.VAD <- data.frame(
 Drate = dr,
 age = rep(ordered(rownames(VADeaths)), length.out = n),
 gender = gl(2, 5, n, labels = c("M", "F")),
 site =  gl(2, 10, labels = c("rural", "urban")))
coplot(Drate ~ as.numeric(age) | gender * site, data = d.VAD,
       panel = panel.smooth, xlab = "VADeaths data - Given: gender")
summary(aov.VAD <- aov(Drate ~ .^2, data = d.VAD))
opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0))
plot(aov.VAD)
par(opar)

head(VADeaths)

##       Rural Male Rural Female Urban Male Urban Female
## 50-54       11.7          8.7       15.4          8.4
## 55-59       18.1         11.7       24.3         13.6
## 60-64       26.9         20.3       37.0         19.3
## 65-69       41.0         30.9       54.6         35.1
## 70-74       66.0         54.3       71.1         50.0

example(VADeaths)

## 
## VADths> require(stats); require(graphics)
## 
## VADths> n <- length(dr <- c(VADeaths))
## 
## VADths> nam <- names(VADeaths)
## 
## VADths> d.VAD <- data.frame(
## VADths+  Drate = dr,
## VADths+  age = rep(ordered(rownames(VADeaths)), length.out = n),
## VADths+  gender = gl(2, 5, n, labels = c("M", "F")),
## VADths+  site =  gl(2, 10, labels = c("rural", "urban")))
## 
## VADths> coplot(Drate ~ as.numeric(age) | gender * site, data = d.VAD,
## VADths+        panel = panel.smooth, xlab = "VADeaths data - Given: gender")

## 
## VADths> summary(aov.VAD <- aov(Drate ~ .^2, data = d.VAD))
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## age          4   6288  1572.1 590.858 8.55e-06 ***
## gender       1    648   647.5 243.361 9.86e-05 ***
## site         1     77    76.8  28.876  0.00579 ** 
## age:gender   4     86    21.6   8.100  0.03358 *  
## age:site     4     43    10.6   3.996  0.10414    
## gender:site  1     73    73.0  27.422  0.00636 ** 
## Residuals    4     11     2.7                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## VADths> opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0))
## 
## VADths> plot(aov.VAD)

## 
## VADths> par(opar)

TOP

WWWusage Internet Usage per Minute

Description
A time series of the numbers of users connected to the Internet through a server every minute.

Usage
WWWusage
Format
A time series of length 100.

Source
Durbin, J. and Koopman, S. J. (2001) Time Series Analysis by State Space Methods. Oxford University Press. http://www.ssfpack.com/dkbook/

References
Makridakis, S., Wheelwright, S. C. and Hyndman, R. J. (1998) Forecasting: Methods and Applications. Wiley.

Examples
require(graphics)
work <- diff(WWWusage)
par(mfrow = c(2, 1)); plot(WWWusage); plot(work)
## Not run: 
require(stats)
aics <- matrix(, 6, 6, dimnames = list(p = 0:5, q = 0:5))
for(q in 1:5) aics[1, 1+q] <- arima(WWWusage, c(0, 1, q),
    optim.control = list(maxit = 500))$aic
for(p in 1:5)
   for(q in 0:5) aics[1+p, 1+q] <- arima(WWWusage, c(p, 1, q),
       optim.control = list(maxit = 500))$aic
round(aics - min(aics, na.rm = TRUE), 2)

## End(Not run)

head(WWWusage)

## [1] 88 84 85 85 84 85

example(WWWusage)

## 
## WWWusg> require(graphics)
## 
## WWWusg> work <- diff(WWWusage)
## 
## WWWusg> par(mfrow = c(2, 1)); plot(WWWusage); plot(work)

## 
## WWWusg> ## Not run: 
## WWWusg> ##D require(stats)
## WWWusg> ##D aics <- matrix(, 6, 6, dimnames = list(p = 0:5, q = 0:5))
## WWWusg> ##D for(q in 1:5) aics[1, 1+q] <- arima(WWWusage, c(0, 1, q),
## WWWusg> ##D     optim.control = list(maxit = 500))$aic
## WWWusg> ##D for(p in 1:5)
## WWWusg> ##D    for(q in 0:5) aics[1+p, 1+q] <- arima(WWWusage, c(p, 1, q),
## WWWusg> ##D        optim.control = list(maxit = 500))$aic
## WWWusg> ##D round(aics - min(aics, na.rm = TRUE), 2)
## WWWusg> ## End(Not run)
## WWWusg> 
## WWWusg>

TOP

WorldPhones The World’s Telephones

Description
The number of telephones in various regions of the world (in thousands).

Usage
WorldPhones
Format
A matrix with 7 rows and 8 columns. The columns of the matrix give the figures for a given region, and the rows the figures for a year.

The regions are: North America, Europe, Asia, South America, Oceania, Africa, Central America.

The years are: 1951, 1956, 1957, 1958, 1959, 1960, 1961.

Source
AT&T (1961) The World's Telephones.

References
McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

Examples
require(graphics)
matplot(rownames(WorldPhones), WorldPhones, type = "b", log = "y",
        xlab = "Year", ylab = "Number of telephones (1000's)")
legend(1951.5, 80000, colnames(WorldPhones), col = 1:6, lty = 1:5,
       pch = rep(21, 7))
title(main = "World phones data: log scale for response")

head(WorldPhones)

##      N.Amer Europe Asia S.Amer Oceania Africa Mid.Amer
## 1951  45939  21574 2876   1815    1646     89      555
## 1956  60423  29990 4708   2568    2366   1411      733
## 1957  64721  32510 5230   2695    2526   1546      773
## 1958  68484  35218 6662   2845    2691   1663      836
## 1959  71799  37598 6856   3000    2868   1769      911
## 1960  76036  40341 8220   3145    3054   1905     1008

example(WorldPhones)

## 
## WrldPh> require(graphics)
## 
## WrldPh> matplot(rownames(WorldPhones), WorldPhones, type = "b", log = "y",
## WrldPh+         xlab = "Year", ylab = "Number of telephones (1000's)")

## 
## WrldPh> legend(1951.5, 80000, colnames(WorldPhones), col = 1:6, lty = 1:5,
## WrldPh+        pch = rep(21, 7))
## 
## WrldPh> title(main = "World phones data: log scale for response")

TOP

ability.cov Ability and Intelligence Tests

Description
Six tests were given to 112 individuals. The covariance matrix is given in this object.

Usage
ability.cov
Details
The tests are described as

general:
a non-verbal measure of general intelligence using Cattell's culture-fair test.

picture:
a picture-completion test

blocks:
block design

maze:
mazes

reading:
reading comprehension

vocab:
vocabulary

Bartholomew gives both covariance and correlation matrices, but these are inconsistent. Neither are in the original paper.

Source
Bartholomew, D. J. (1987). Latent Variable Analysis and Factor Analysis. Griffin.

Bartholomew, D. J. and Knott, M. (1990). Latent Variable Analysis and Factor Analysis. Second Edition, Arnold.

References
Smith, G. A. and Stanley G. (1983). Clocking g: relating intelligence and measures of timed performance. Intelligence, 7, 353–368. doi: 10.1016/0160-2896(83)90010-7.

Examples
require(stats)
(ability.FA <- factanal(factors = 1, covmat = ability.cov))
update(ability.FA, factors = 2)
## The signs of factors and hence the signs of correlations are
## arbitrary with promax rotation.
update(ability.FA, factors = 2, rotation = "promax")

head(ability.cov)

## $cov
##         general picture  blocks   maze reading   vocab
## general  24.641   5.991  33.520  6.023  20.755  29.701
## picture   5.991   6.700  18.137  1.782   4.936   7.204
## blocks   33.520  18.137 149.831 19.424  31.430  50.753
## maze      6.023   1.782  19.424 12.711   4.757   9.075
## reading  20.755   4.936  31.430  4.757  52.604  66.762
## vocab    29.701   7.204  50.753  9.075  66.762 135.292
## 
## $center
## [1] 0 0 0 0 0 0
## 
## $n.obs
## [1] 112

example(ability.cov)

## 
## ablty.> ## No test: 
## ablty.> ##D require(stats)
## ablty.> ##D (ability.FA <- factanal(factors = 1, covmat = ability.cov))
## ablty.> ##D update(ability.FA, factors = 2)
## ablty.> ##D ## The signs of factors and hence the signs of correlations are
## ablty.> ##D ## arbitrary with promax rotation.
## ablty.> ##D update(ability.FA, factors = 2, rotation = "promax")
## ablty.> ## End(No test)
## ablty.> 
## ablty.>

TOP

airmiles Passenger Miles on Commercial US Airlines, 1937-1960

Description
The revenue passenger miles flown by commercial airlines in the United States for each year from 1937 to 1960.

Usage
airmiles
Format
A time series of 24 observations; yearly, 1937–1960.

Source
F.A.A. Statistical Handbook of Aviation.

References
Brown, R. G. (1963) Smoothing, Forecasting and Prediction of Discrete Time Series. Prentice-Hall.

Examples
require(graphics)
plot(airmiles, main = "airmiles data",
     xlab = "Passenger-miles flown by U.S. commercial airlines", col = 4)

head(airmiles)

## [1]  412  480  683 1052 1385 1418

example(airmiles)

## 
## airmls> require(graphics)
## 
## airmls> plot(airmiles, main = "airmiles data",
## airmls+      xlab = "Passenger-miles flown by U.S. commercial airlines", col = 4)

TOP

airquality New York Air Quality Measurements

Description
Daily air quality measurements in New York, May to September 1973.

Usage
airquality
Format
A data frame with 153 observations on 6 variables.

[,1]    Ozone   numeric Ozone (ppb)
[,2]    Solar.R numeric Solar R (lang)
[,3]    Wind    numeric Wind (mph)
[,4]    Temp    numeric Temperature (degrees F)
[,5]    Month   numeric Month (1--12)
[,6]    Day numeric Day of month (1--31)
Details
Daily readings of the following air quality values for May 1, 1973 (a Tuesday) to September 30, 1973.

Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island

Solar.R: Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from 0800 to 1200 hours at Central Park

Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport

Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport.

Source
The data were obtained from the New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data).

References
Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical Methods for Data Analysis. Belmont, CA: Wadsworth.

Examples
require(graphics)
pairs(airquality, panel = panel.smooth, main = "airquality data")

head(airquality)

example(airquality)

## 
## arqlty> require(graphics)
## 
## arqlty> pairs(airquality, panel = panel.smooth, main = "airquality data")

TOP

anscombe Anscombe’s Quartet of ‘Identical’ Simple Linear Regressions

Description
Four x-y datasets which have the same traditional statistical properties (mean, variance, correlation, regression line, etc.), yet are quite different.

Usage
anscombe
Format
A data frame with 11 observations on 8 variables.

x1 == x2 == x3  the integers 4:14, specially arranged
x4  values 8 and 19
y1, y2, y3, y4  numbers in (3, 12.5) with mean 7.5 and sdev 2.03
Source
Tufte, Edward R. (1989). The Visual Display of Quantitative Information, 13–14. Graphics Press.

References
Anscombe, Francis J. (1973). Graphs in statistical analysis. The American Statistician, 27, 17–21. doi: 10.2307/2682899.

Examples
require(stats); require(graphics)
summary(anscombe)

##-- now some "magic" to do the 4 regressions in a loop:
ff <- y ~ x
mods <- setNames(as.list(1:4), paste0("lm", 1:4))
for(i in 1:4) {
  ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
  ## or   ff[[2]] <- as.name(paste0("y", i))
  ##      ff[[3]] <- as.name(paste0("x", i))
  mods[[i]] <- lmi <- lm(ff, data = anscombe)
  print(anova(lmi))
}

## See how close they are (numerically!)
sapply(mods, coef)
lapply(mods, function(fm) coef(summary(fm)))

## Now, do what you should have done in the first place: PLOTS
op <- par(mfrow = c(2, 2), mar = 0.1+c(4,4,1,1), oma =  c(0, 0, 2, 0))
for(i in 1:4) {
  ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
  plot(ff, data = anscombe, col = "red", pch = 21, bg = "orange", cex = 1.2,
       xlim = c(3, 19), ylim = c(3, 13))
  abline(mods[[i]], col = "blue")
}
mtext("Anscombe's 4 Regression data sets", outer = TRUE, cex = 1.5)
par(op)

head(anscombe)

example(anscombe)

## 
## anscmb> require(stats); require(graphics)
## 
## anscmb> summary(anscombe)
##        x1             x2             x3             x4           y1        
##  Min.   : 4.0   Min.   : 4.0   Min.   : 4.0   Min.   : 8   Min.   : 4.260  
##  1st Qu.: 6.5   1st Qu.: 6.5   1st Qu.: 6.5   1st Qu.: 8   1st Qu.: 6.315  
##  Median : 9.0   Median : 9.0   Median : 9.0   Median : 8   Median : 7.580  
##  Mean   : 9.0   Mean   : 9.0   Mean   : 9.0   Mean   : 9   Mean   : 7.501  
##  3rd Qu.:11.5   3rd Qu.:11.5   3rd Qu.:11.5   3rd Qu.: 8   3rd Qu.: 8.570  
##  Max.   :14.0   Max.   :14.0   Max.   :14.0   Max.   :19   Max.   :10.840  
##        y2              y3              y4        
##  Min.   :3.100   Min.   : 5.39   Min.   : 5.250  
##  1st Qu.:6.695   1st Qu.: 6.25   1st Qu.: 6.170  
##  Median :8.140   Median : 7.11   Median : 7.040  
##  Mean   :7.501   Mean   : 7.50   Mean   : 7.501  
##  3rd Qu.:8.950   3rd Qu.: 7.98   3rd Qu.: 8.190  
##  Max.   :9.260   Max.   :12.74   Max.   :12.500  
## 
## anscmb> ##-- now some "magic" to do the 4 regressions in a loop:
## anscmb> ff <- y ~ x
## 
## anscmb> mods <- setNames(as.list(1:4), paste0("lm", 1:4))
## 
## anscmb> for(i in 1:4) {
## anscmb+   ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
## anscmb+   ## or   ff[[2]] <- as.name(paste0("y", i))
## anscmb+   ##      ff[[3]] <- as.name(paste0("x", i))
## anscmb+   mods[[i]] <- lmi <- lm(ff, data = anscombe)
## anscmb+   print(anova(lmi))
## anscmb+ }
## Analysis of Variance Table
## 
## Response: y1
##           Df Sum Sq Mean Sq F value  Pr(>F)   
## x1         1 27.510 27.5100   17.99 0.00217 **
## Residuals  9 13.763  1.5292                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
## 
## Response: y2
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## x2         1 27.500 27.5000  17.966 0.002179 **
## Residuals  9 13.776  1.5307                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
## 
## Response: y3
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## x3         1 27.470 27.4700  17.972 0.002176 **
## Residuals  9 13.756  1.5285                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
## 
## Response: y4
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## x4         1 27.490 27.4900  18.003 0.002165 **
## Residuals  9 13.742  1.5269                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## anscmb> ## See how close they are (numerically!)
## anscmb> sapply(mods, coef)
##                   lm1      lm2       lm3       lm4
## (Intercept) 3.0000909 3.000909 3.0024545 3.0017273
## x1          0.5000909 0.500000 0.4997273 0.4999091
## 
## anscmb> lapply(mods, function(fm) coef(summary(fm)))
## $lm1
##              Estimate Std. Error  t value    Pr(>|t|)
## (Intercept) 3.0000909  1.1247468 2.667348 0.025734051
## x1          0.5000909  0.1179055 4.241455 0.002169629
## 
## $lm2
##             Estimate Std. Error  t value    Pr(>|t|)
## (Intercept) 3.000909  1.1253024 2.666758 0.025758941
## x2          0.500000  0.1179637 4.238590 0.002178816
## 
## $lm3
##              Estimate Std. Error  t value    Pr(>|t|)
## (Intercept) 3.0024545  1.1244812 2.670080 0.025619109
## x3          0.4997273  0.1178777 4.239372 0.002176305
## 
## $lm4
##              Estimate Std. Error  t value    Pr(>|t|)
## (Intercept) 3.0017273  1.1239211 2.670763 0.025590425
## x4          0.4999091  0.1178189 4.243028 0.002164602
## 
## 
## anscmb> ## Now, do what you should have done in the first place: PLOTS
## anscmb> op <- par(mfrow = c(2, 2), mar = 0.1+c(4,4,1,1), oma =  c(0, 0, 2, 0))
## 
## anscmb> for(i in 1:4) {
## anscmb+   ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
## anscmb+   plot(ff, data = anscombe, col = "red", pch = 21, bg = "orange", cex = 1.2,
## anscmb+        xlim = c(3, 19), ylim = c(3, 13))
## anscmb+   abline(mods[[i]], col = "blue")
## anscmb+ }

## 
## anscmb> mtext("Anscombe's 4 Regression data sets", outer = TRUE, cex = 1.5)
## 
## anscmb> par(op)

TOP

attenu The Joyner-Boore Attenuation Data

Description
This data gives peak accelerations measured at various observation stations for 23 earthquakes in California. The data have been used by various workers to estimate the attenuating affect of distance on ground acceleration.

Usage
attenu
Format
A data frame with 182 observations on 5 variables.

[,1]    event   numeric Event Number
[,2]    mag numeric Moment Magnitude
[,3]    station factor  Station Number
[,4]    dist    numeric Station-hypocenter distance (km)
[,5]    accel   numeric Peak acceleration (g)
Source
Joyner, W.B., D.M. Boore and R.D. Porcella (1981). Peak horizontal acceleration and velocity from strong-motion records including records from the 1979 Imperial Valley, California earthquake. USGS Open File report 81-365. Menlo Park, Ca.

References
Boore, D. M. and Joyner, W. B.(1982). The empirical prediction of ground motion, Bulletin of the Seismological Society of America, 72, S269–S268.

Bolt, B. A. and Abrahamson, N. A. (1982). New attenuation relations for peak and expected accelerations of strong ground motion. Bulletin of the Seismological Society of America, 72, 2307–2321.

Bolt B. A. and Abrahamson, N. A. (1983). Reply to W. B. Joyner & D. M. Boore's “Comments on: New attenuation relations for peak and expected accelerations for peak and expected accelerations of strong ground motion”, Bulletin of the Seismological Society of America, 73, 1481–1483.

Brillinger, D. R. and Preisler, H. K. (1984). An exploratory analysis of the Joyner-Boore attenuation data, Bulletin of the Seismological Society of America, 74, 1441–1449.

Brillinger, D. R. and Preisler, H. K. (1984). Further analysis of the Joyner-Boore attenuation data. Manuscript.

Examples
require(graphics)
## check the data class of the variables
sapply(attenu, data.class)
summary(attenu)
pairs(attenu, main = "attenu data")
coplot(accel ~ dist | as.factor(event), data = attenu, show.given = FALSE)
coplot(log(accel) ~ log(dist) | as.factor(event),
       data = attenu, panel = panel.smooth, show.given = FALSE)

head(attenu)

example(attenu)

## 
## attenu> require(graphics)
## 
## attenu> ## check the data class of the variables
## attenu> sapply(attenu, data.class)
##     event       mag   station      dist     accel 
## "numeric" "numeric"  "factor" "numeric" "numeric" 
## 
## attenu> summary(attenu)
##      event            mag           station         dist       
##  Min.   : 1.00   Min.   :5.000   117    :  5   Min.   :  0.50  
##  1st Qu.: 9.00   1st Qu.:5.300   1028   :  4   1st Qu.: 11.32  
##  Median :18.00   Median :6.100   113    :  4   Median : 23.40  
##  Mean   :14.74   Mean   :6.084   112    :  3   Mean   : 45.60  
##  3rd Qu.:20.00   3rd Qu.:6.600   135    :  3   3rd Qu.: 47.55  
##  Max.   :23.00   Max.   :7.700   (Other):147   Max.   :370.00  
##                                  NA's   : 16                   
##      accel        
##  Min.   :0.00300  
##  1st Qu.:0.04425  
##  Median :0.11300  
##  Mean   :0.15422  
##  3rd Qu.:0.21925  
##  Max.   :0.81000  
##                   
## 
## attenu> pairs(attenu, main = "attenu data")

## 
## attenu> coplot(accel ~ dist | as.factor(event), data = attenu, show.given = FALSE)

## 
## attenu> coplot(log(accel) ~ log(dist) | as.factor(event),
## attenu+        data = attenu, panel = panel.smooth, show.given = FALSE)

TOP

attitude The Chatterjee-Price Attitude Data

Description
From a survey of the clerical employees of a large financial organization, the data are aggregated from the questionnaires of the approximately 35 employees for each of 30 (randomly selected) departments. The numbers give the percent proportion of favourable responses to seven questions in each department.

Usage
attitude
Format
A data frame with 30 observations on 7 variables. The first column are the short names from the reference, the second one the variable names in the data frame:

Y   rating  numeric Overall rating
X[1]    complaints  numeric Handling of employee complaints
X[2]    privileges  numeric Does not allow special privileges
X[3]    learning    numeric Opportunity to learn
X[4]    raises  numeric Raises based on performance
X[5]    critical    numeric Too critical
X[6]    advancel    numeric Advancement
Source
Chatterjee, S. and Price, B. (1977) Regression Analysis by Example. New York: Wiley. (Section 3.7, p.68ff of 2nd ed.(1991).)

Examples
require(stats); require(graphics)
pairs(attitude, main = "attitude data")
summary(attitude)
summary(fm1 <- lm(rating ~ ., data = attitude))
opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
            mar = c(4.1, 4.1, 2.1, 1.1))
plot(fm1)
summary(fm2 <- lm(rating ~ complaints, data = attitude))
plot(fm2)
par(opar)

head(attitude)

example(attitude)

## 
## attitd> require(stats); require(graphics)
## 
## attitd> pairs(attitude, main = "attitude data")

## 
## attitd> summary(attitude)
##      rating        complaints     privileges       learning         raises     
##  Min.   :40.00   Min.   :37.0   Min.   :30.00   Min.   :34.00   Min.   :43.00  
##  1st Qu.:58.75   1st Qu.:58.5   1st Qu.:45.00   1st Qu.:47.00   1st Qu.:58.25  
##  Median :65.50   Median :65.0   Median :51.50   Median :56.50   Median :63.50  
##  Mean   :64.63   Mean   :66.6   Mean   :53.13   Mean   :56.37   Mean   :64.63  
##  3rd Qu.:71.75   3rd Qu.:77.0   3rd Qu.:62.50   3rd Qu.:66.75   3rd Qu.:71.00  
##  Max.   :85.00   Max.   :90.0   Max.   :83.00   Max.   :75.00   Max.   :88.00  
##     critical        advance     
##  Min.   :49.00   Min.   :25.00  
##  1st Qu.:69.25   1st Qu.:35.00  
##  Median :77.50   Median :41.00  
##  Mean   :74.77   Mean   :42.93  
##  3rd Qu.:80.00   3rd Qu.:47.75  
##  Max.   :92.00   Max.   :72.00  
## 
## attitd> summary(fm1 <- lm(rating ~ ., data = attitude))
## 
## Call:
## lm(formula = rating ~ ., data = attitude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.9418  -4.3555   0.3158   5.5425  11.5990 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.78708   11.58926   0.931 0.361634    
## complaints   0.61319    0.16098   3.809 0.000903 ***
## privileges  -0.07305    0.13572  -0.538 0.595594    
## learning     0.32033    0.16852   1.901 0.069925 .  
## raises       0.08173    0.22148   0.369 0.715480    
## critical     0.03838    0.14700   0.261 0.796334    
## advance     -0.21706    0.17821  -1.218 0.235577    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.068 on 23 degrees of freedom
## Multiple R-squared:  0.7326, Adjusted R-squared:  0.6628 
## F-statistic:  10.5 on 6 and 23 DF,  p-value: 1.24e-05
## 
## 
## attitd> opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
## attitd+             mar = c(4.1, 4.1, 2.1, 1.1))
## 
## attitd> plot(fm1)

## 
## attitd> summary(fm2 <- lm(rating ~ complaints, data = attitude))
## 
## Call:
## lm(formula = rating ~ complaints, data = attitude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.8799  -5.9905   0.1783   6.2978   9.6294 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 14.37632    6.61999   2.172   0.0385 *  
## complaints   0.75461    0.09753   7.737 1.99e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.993 on 28 degrees of freedom
## Multiple R-squared:  0.6813, Adjusted R-squared:  0.6699 
## F-statistic: 59.86 on 1 and 28 DF,  p-value: 1.988e-08
## 
## 
## attitd> plot(fm2)

## 
## attitd> par(opar)

TOP

austres Quarterly Time Series of the Number of Australian Residents

Description
Numbers (in thousands) of Australian residents measured quarterly from March 1971 to March 1994. The object is of class "ts".

Usage
austres
Source
P. J. Brockwell and R. A. Davis (1996) Introduction to Time Series and Forecasting. Springer

head(austres)

## [1] 13067.3 13130.5 13198.4 13254.2 13303.7 13353.9

TOP

beavers Body Temperature Series of Two Beavers

Description
Reynolds (1994) describes a small part of a study of the long-term temperature dynamics of beaver Castor canadensis in north-central Wisconsin. Body temperature was measured by telemetry every 10 minutes for four females, but data from a one period of less than a day for each of two animals is used there.

Usage
beaver1
beaver2
Format
The beaver1 data frame has 114 rows and 4 columns on body temperature measurements at 10 minute intervals.

The beaver2 data frame has 100 rows and 4 columns on body temperature measurements at 10 minute intervals.

The variables are as follows:

day
Day of observation (in days since the beginning of 1990), December 12–13 (beaver1) and November 3–4 (beaver2).

time
Time of observation, in the form 0330 for 3:30am

temp
Measured body temperature in degrees Celsius.

activ
Indicator of activity outside the retreat.

Note
The observation at 22:20 is missing in beaver1.

Source
P. S. Reynolds (1994) Time-series analyses of beaver body temperatures. Chapter 11 of Lange, N., Ryan, L., Billard, L., Brillinger, D., Conquest, L. and Greenhouse, J. eds (1994) Case Studies in Biometry. New York: John Wiley and Sons.

Examples
require(graphics)
(yl <- range(beaver1$temp, beaver2$temp))

beaver.plot <- function(bdat, ...) {
  nam <- deparse(substitute(bdat))
  with(bdat, {
    # Hours since start of day:
    hours <- time %/% 100 + 24*(day - day[1]) + (time %% 100)/60
    plot (hours, temp, type = "l", ...,
          main = paste(nam, "body temperature"))
    abline(h = 37.5, col = "gray", lty = 2)
    is.act <- activ == 1
    points(hours[is.act], temp[is.act], col = 2, cex = .8)
  })
}
op <- par(mfrow = c(2, 1), mar = c(3, 3, 4, 2), mgp = 0.9 * 2:0)
 beaver.plot(beaver1, ylim = yl)
 beaver.plot(beaver2, ylim = yl)
par(op)

head(beaver1)

head(beaver2)

example(beavers)

## 
## beavrs> require(graphics)
## 
## beavrs> (yl <- range(beaver1$temp, beaver2$temp))
## [1] 36.33 38.35
## 
## beavrs> beaver.plot <- function(bdat, ...) {
## beavrs+   nam <- deparse(substitute(bdat))
## beavrs+   with(bdat, {
## beavrs+     # Hours since start of day:
## beavrs+     hours <- time %/% 100 + 24*(day - day[1]) + (time %% 100)/60
## beavrs+     plot (hours, temp, type = "l", ...,
## beavrs+           main = paste(nam, "body temperature"))
## beavrs+     abline(h = 37.5, col = "gray", lty = 2)
## beavrs+     is.act <- activ == 1
## beavrs+     points(hours[is.act], temp[is.act], col = 2, cex = .8)
## beavrs+   })
## beavrs+ }
## 
## beavrs> op <- par(mfrow = c(2, 1), mar = c(3, 3, 4, 2), mgp = 0.9 * 2:0)
## 
## beavrs>  beaver.plot(beaver1, ylim = yl)

## 
## beavrs>  beaver.plot(beaver2, ylim = yl)

## 
## beavrs> par(op)

TOP

cars Speed and Stopping Distances of Cars

Description
The data give the speed of cars and the distances taken to stop. Note that the data were recorded in the 1920s.

Usage
cars
Format
A data frame with 50 observations on 2 variables.

[,1]    speed   numeric Speed (mph)
[,2]    dist    numeric Stopping distance (ft)
Source
Ezekiel, M. (1930) Methods of Correlation Analysis. Wiley.

References
McNeil, D. R. (1977) Interactive Data Analysis. Wiley.

Examples
require(stats); require(graphics)
plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
     las = 1)
lines(lowess(cars$speed, cars$dist, f = 2/3, iter = 3), col = "red")
title(main = "cars data")
plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
     las = 1, log = "xy")
title(main = "cars data (logarithmic scales)")
lines(lowess(cars$speed, cars$dist, f = 2/3, iter = 3), col = "red")
summary(fm1 <- lm(log(dist) ~ log(speed), data = cars))
opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
            mar = c(4.1, 4.1, 2.1, 1.1))
plot(fm1)
par(opar)

## An example of polynomial regression
plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
    las = 1, xlim = c(0, 25))
d <- seq(0, 25, length.out = 200)
for(degree in 1:4) {
  fm <- lm(dist ~ poly(speed, degree), data = cars)
  assign(paste("cars", degree, sep = "."), fm)
  lines(d, predict(fm, data.frame(speed = d)), col = degree)
}
anova(cars.1, cars.2, cars.3, cars.4)

head(cars)

example(cars)

## 
## cars> require(stats); require(graphics)
## 
## cars> plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
## cars+      las = 1)

## 
## cars> lines(lowess(cars$speed, cars$dist, f = 2/3, iter = 3), col = "red")
## 
## cars> title(main = "cars data")
## 
## cars> plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
## cars+      las = 1, log = "xy")

## 
## cars> title(main = "cars data (logarithmic scales)")
## 
## cars> lines(lowess(cars$speed, cars$dist, f = 2/3, iter = 3), col = "red")
## 
## cars> summary(fm1 <- lm(log(dist) ~ log(speed), data = cars))
## 
## Call:
## lm(formula = log(dist) ~ log(speed), data = cars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.00215 -0.24578 -0.02898  0.20717  0.88289 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.7297     0.3758  -1.941   0.0581 .  
## log(speed)    1.6024     0.1395  11.484 2.26e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4053 on 48 degrees of freedom
## Multiple R-squared:  0.7331, Adjusted R-squared:  0.7276 
## F-statistic: 131.9 on 1 and 48 DF,  p-value: 2.259e-15
## 
## 
## cars> opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
## cars+             mar = c(4.1, 4.1, 2.1, 1.1))
## 
## cars> plot(fm1)

## 
## cars> par(opar)
## 
## cars> ## An example of polynomial regression
## cars> plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
## cars+     las = 1, xlim = c(0, 25))

## 
## cars> d <- seq(0, 25, length.out = 200)
## 
## cars> for(degree in 1:4) {
## cars+   fm <- lm(dist ~ poly(speed, degree), data = cars)
## cars+   assign(paste("cars", degree, sep = "."), fm)
## cars+   lines(d, predict(fm, data.frame(speed = d)), col = degree)
## cars+ }
## 
## cars> anova(cars.1, cars.2, cars.3, cars.4)
## Analysis of Variance Table
## 
## Model 1: dist ~ poly(speed, degree)
## Model 2: dist ~ poly(speed, degree)
## Model 3: dist ~ poly(speed, degree)
## Model 4: dist ~ poly(speed, degree)
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1     48 11354                           
## 2     47 10825  1    528.81 2.3108 0.1355
## 3     46 10634  1    190.35 0.8318 0.3666
## 4     45 10298  1    336.55 1.4707 0.2316

TOP

chickwts Chicken Weights by Feed Type

Description
An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens.

Usage
chickwts
Format
A data frame with 71 observations on the following 2 variables.

weight
a numeric variable giving the chick weight.

feed
a factor giving the feed type.

Details
Newly hatched chicks were randomly allocated into six groups, and each group was given a different feed supplement. Their weights in grams after six weeks are given along with feed types.

Source
Anonymous (1948) Biometrika, 35, 214.

References
McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

Examples
require(stats); require(graphics)
boxplot(weight ~ feed, data = chickwts, col = "lightgray",
    varwidth = TRUE, notch = TRUE, main = "chickwt data",
    ylab = "Weight at six weeks (gm)")
anova(fm1 <- lm(weight ~ feed, data = chickwts))
opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
            mar = c(4.1, 4.1, 2.1, 1.1))
plot(fm1)
par(opar)

head(chickwts)

example(chickwts)

## 
## chckwt> require(stats); require(graphics)
## 
## chckwt> boxplot(weight ~ feed, data = chickwts, col = "lightgray",
## chckwt+     varwidth = TRUE, notch = TRUE, main = "chickwt data",
## chckwt+     ylab = "Weight at six weeks (gm)")

## Warning in bxp(list(stats = structure(c(216, 271.5, 342, 373.5, 404, 108, : some
## notches went outside hinges ('box'): maybe set notch=FALSE

## 
## chckwt> anova(fm1 <- lm(weight ~ feed, data = chickwts))
## Analysis of Variance Table
## 
## Response: weight
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## feed       5 231129   46226  15.365 5.936e-10 ***
## Residuals 65 195556    3009                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## chckwt> opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
## chckwt+             mar = c(4.1, 4.1, 2.1, 1.1))
## 
## chckwt> plot(fm1)

## 
## chckwt> par(opar)

TOP

co2 Mauna Loa Atmospheric CO2 Concentration

Description
Atmospheric concentrations of CO2 are expressed in parts per million (ppm) and reported in the preliminary 1997 SIO manometric mole fraction scale.

Usage
co2
Format
A time series of 468 observations; monthly from 1959 to 1997.

Details
The values for February, March and April of 1964 were missing and have been obtained by interpolating linearly between the values for January and May of 1964.

Source
Keeling, C. D. and Whorf, T. P., Scripps Institution of Oceanography (SIO), University of California, La Jolla, California USA 92093-0220.

ftp://cdiac.esd.ornl.gov/pub/maunaloa-co2/maunaloa.co2.

References
Cleveland, W. S. (1993) Visualizing Data. New Jersey: Summit Press.

Examples
require(graphics)
plot(co2, ylab = expression("Atmospheric concentration of CO"[2]),
     las = 1)
title(main = "co2 data set")

head(co2)

## [1] 315.42 316.31 316.50 317.56 318.13 318.00

example(co2)

## 
## co2> require(graphics)
## 
## co2> plot(co2, ylab = expression("Atmospheric concentration of CO"[2]),
## co2+      las = 1)

## 
## co2> title(main = "co2 data set")

TOP

crimtab Student’s 3000 Criminals Data

Description
Data of 3000 male criminals over 20 years old undergoing their sentences in the chief prisons of England and Wales.

Usage
crimtab
Format
A table object of integer counts, of dimension 42 * 22 with a total count, sum(crimtab) of 3000.

The 42 rownames ("9.4", "9.5", ...) correspond to midpoints of intervals of finger lengths whereas the 22 column names (colnames) ("142.24", "144.78", ...) correspond to (body) heights of 3000 criminals, see also below.

Details
Student is the pseudonym of William Sealy Gosset. In his 1908 paper he wrote (on page 13) at the beginning of section VI entitled Practical Test of the forgoing Equations:

“Before I had succeeded in solving my problem analytically, I had endeavoured to do so empirically. The material used was a correlation table containing the height and left middle finger measurements of 3000 criminals, from a paper by W. R. MacDonell (Biometrika, Vol. I., p. 219). The measurements were written out on 3000 pieces of cardboard, which were then very thoroughly shuffled and drawn at random. As each card was drawn its numbers were written down in a book, which thus contains the measurements of 3000 criminals in a random order. Finally, each consecutive set of 4 was taken as a sample—750 in all—and the mean, standard deviation, and correlation of each sample determined. The difference between the mean of each sample and the mean of the population was then divided by the standard deviation of the sample, giving us the z of Section III.”

The table is in fact page 216 and not page 219 in MacDonell(1902). In the MacDonell table, the middle finger lengths were given in mm and the heights in feet/inches intervals, they are both converted into cm here. The midpoints of intervals were used, e.g., where MacDonell has 4' 7''9/16 -- 8''9/16, we have 142.24 which is 2.54*56 = 2.54*(4' 8'').

MacDonell credited the source of data (page 178) as follows: The data on which the memoir is based were obtained, through the kindness of Dr Garson, from the Central Metric Office, New Scotland Yard... He pointed out on page 179 that : The forms were drawn at random from the mass on the office shelves; we are therefore dealing with a random sampling.

Source
http://pbil.univ-lyon1.fr/R/donnees/criminals1902.txt thanks to Jean R. Lobry and Anne-Béatrice Dufour.

References
Garson, J.G. (1900). The metric system of identification of criminals, as used in in Great Britain and Ireland. The Journal of the Anthropological Institute of Great Britain and Ireland, 30, 161–198. doi: 10.2307/2842627.

MacDonell, W.R. (1902). On criminal anthropometry and the identification of criminals. Biometrika, 1(2), 177–227. doi: 10.2307/2331487.

Student (1908). The probable error of a mean. Biometrika, 6, 1–25. doi: 10.2307/2331554.

Examples
require(stats)
dim(crimtab)
utils::str(crimtab)
## for nicer printing:
local({cT <- crimtab
       colnames(cT) <- substring(colnames(cT), 2, 3)
       print(cT, zero.print = " ")
})

## Repeat Student's experiment:

# 1) Reconstitute 3000 raw data for heights in inches and rounded to
#    nearest integer as in Student's paper:

(heIn <- round(as.numeric(colnames(crimtab)) / 2.54))
d.hei <- data.frame(height = rep(heIn, colSums(crimtab)))

# 2) shuffle the data:

set.seed(1)
d.hei <- d.hei[sample(1:3000), , drop = FALSE]

# 3) Make 750 samples each of size 4:

d.hei$sample <- as.factor(rep(1:750, each = 4))

# 4) Compute the means and standard deviations (n) for the 750 samples:

h.mean <- with(d.hei, tapply(height, sample, FUN = mean))
h.sd   <- with(d.hei, tapply(height, sample, FUN = sd)) * sqrt(3/4)

# 5) Compute the difference between the mean of each sample and
#    the mean of the population and then divide by the
#    standard deviation of the sample:

zobs <- (h.mean - mean(d.hei[,"height"]))/h.sd

# 6) Replace infinite values by +/- 6 as in Student's paper:

zobs[infZ <- is.infinite(zobs)] # none of them 
zobs[infZ] <- 6 * sign(zobs[infZ])

# 7) Plot the distribution:

require(grDevices); require(graphics)
hist(x = zobs, probability = TRUE, xlab = "Student's z",
     col = grey(0.8), border = grey(0.5),
     main = "Distribution of Student's z score  for 'crimtab' data")

head(crimtab)

##     142.24 144.78 147.32 149.86 152.4 154.94 157.48 160.02 162.56 165.1 167.64
## 9.4      0      0      0      0     0      0      0      0      0     0      0
## 9.5      0      0      0      0     0      1      0      0      0     0      0
## 9.6      0      0      0      0     0      0      0      0      0     0      0
## 9.7      0      0      0      0     0      0      0      0      0     0      0
## 9.8      0      0      0      0     0      0      1      0      0     0      0
## 9.9      0      0      1      0     1      0      1      0      0     0      0
##     170.18 172.72 175.26 177.8 180.34 182.88 185.42 187.96 190.5 193.04 195.58
## 9.4      0      0      0     0      0      0      0      0     0      0      0
## 9.5      0      0      0     0      0      0      0      0     0      0      0
## 9.6      0      0      0     0      0      0      0      0     0      0      0
## 9.7      0      0      0     0      0      0      0      0     0      0      0
## 9.8      0      0      0     0      0      0      0      0     0      0      0
## 9.9      0      0      0     0      0      0      0      0     0      0      0

example(crimtab)

## 
## crimtb> require(stats)
## 
## crimtb> dim(crimtab)
## [1] 42 22
## 
## crimtb> utils::str(crimtab)
##  'table' int [1:42, 1:22] 0 0 0 0 0 0 1 0 0 0 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:42] "9.4" "9.5" "9.6" "9.7" ...
##   ..$ : chr [1:22] "142.24" "144.78" "147.32" "149.86" ...
## 
## crimtb> ## for nicer printing:
## crimtb> local({cT <- crimtab
## crimtb+        colnames(cT) <- substring(colnames(cT), 2, 3)
## crimtb+        print(cT, zero.print = " ")
## crimtb+ })
##      42 44 47 49 52 54 57 60 62 65 67 70 72 75 77 80 82 85 87 90 93 95
## 9.4                                                                   
## 9.5                  1                                                
## 9.6                                                                   
## 9.7                                                                   
## 9.8                     1                                             
## 9.9         1     1     1                                             
## 10    1        1  2     2        1                                    
## 10.1           1  3  1     1  1                                       
## 10.2        2  2  2  1     2     1                                    
## 10.3     1  1  3  2  2  3  5                                          
## 10.4        1  1  2  3  3  4  3  3                                    
## 10.5           1  3  7  6  4  3  1  3  1     1                        
## 10.6           1  4  5  9 14  6  3  1        1                        
## 10.7        1  2  4  9 14 16 15  7  3  1  2                           
## 10.8           2  5  6 14 27 10  7  1  2  1                           
## 10.9              2  6 14 24 27 14 10  4  1                           
## 11             2  6 12 15 31 37 27 17 10  6                           
## 11.1           3  3 12 22 26 24 26 24  7  4  1                        
## 11.2           3  2  7 21 30 38 29 27 20  4  1                       1
## 11.3           1     5 10 24 26 39 26 24  7  2                        
## 11.4              3  4  9 29 56 58 26 22 10 11                        
## 11.5                 5 11 17 33 57 38 34 25 11  2                     
## 11.6              2  1  4 13 37 39 48 38 27 12  2  2     1            
## 11.7                 2  9 17 30 37 48 45 24  9  9  2                  
## 11.8              1     2 11 15 35 41 34 29 10  5  1                  
## 11.9              1  1  2 12 10 27 32 35 19 10  9  3  1               
## 12                      1  4  8 19 42 39 22 16  8  2  2               
## 12.1                       2  4 13 22 28 15 27 10  4  1               
## 12.2                    1  2  5  6 23 17 16 11  8  1  1               
## 12.3                          4  8 10 13 20 23  6  5                  
## 12.4                    1  1  1  2  7 12  4  7  7  1        1         
## 12.5                       1     1  3 12 11  8  6  8     2            
## 12.6                             1     3  5  7  8  6  3  1  1         
## 12.7                             1  1  7  5  5  8  2  2               
## 12.8                                1  2  3  1  8  5  3  1  1         
## 12.9                                   1  2  2     1  1               
## 13                                  3     1     1     2  1            
## 13.1                                   1  1                           
## 13.2                                1  1     1     3                  
## 13.3                                                  1     1         
## 13.4                                                                  
## 13.5                                                     1            
## 
## crimtb> ## Repeat Student's experiment:
## crimtb> 
## crimtb> # 1) Reconstitute 3000 raw data for heights in inches and rounded to
## crimtb> #    nearest integer as in Student's paper:
## crimtb> 
## crimtb> (heIn <- round(as.numeric(colnames(crimtab)) / 2.54))
##  [1] 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
## 
## crimtb> d.hei <- data.frame(height = rep(heIn, colSums(crimtab)))
## 
## crimtb> # 2) shuffle the data:
## crimtb> 
## crimtb> set.seed(1)
## 
## crimtb> d.hei <- d.hei[sample(1:3000), , drop = FALSE]
## 
## crimtb> # 3) Make 750 samples each of size 4:
## crimtb> 
## crimtb> d.hei$sample <- as.factor(rep(1:750, each = 4))
## 
## crimtb> # 4) Compute the means and standard deviations (n) for the 750 samples:
## crimtb> 
## crimtb> h.mean <- with(d.hei, tapply(height, sample, FUN = mean))
## 
## crimtb> h.sd   <- with(d.hei, tapply(height, sample, FUN = sd)) * sqrt(3/4)
## 
## crimtb> # 5) Compute the difference between the mean of each sample and
## crimtb> #    the mean of the population and then divide by the
## crimtb> #    standard deviation of the sample:
## crimtb> 
## crimtb> zobs <- (h.mean - mean(d.hei[,"height"]))/h.sd
## 
## crimtb> # 6) Replace infinite values by +/- 6 as in Student's paper:
## crimtb> 
## crimtb> zobs[infZ <- is.infinite(zobs)] # none of them 
## named numeric(0)
## 
## crimtb> zobs[infZ] <- 6 * sign(zobs[infZ])
## 
## crimtb> # 7) Plot the distribution:
## crimtb> 
## crimtb> require(grDevices); require(graphics)
## 
## crimtb> hist(x = zobs, probability = TRUE, xlab = "Student's z",
## crimtb+      col = grey(0.8), border = grey(0.5),
## crimtb+      main = "Distribution of Student's z score  for 'crimtab' data")

TOP

datasets-package The R Datasets Package

Description
Base R datasets

Details
This package contains a variety of datasets. For a complete list, use library(help = "datasets").

Author(s)
R Core Team and contributors worldwide

Maintainer: R Core Team R-core@r-project.org

TOP

discoveries Yearly Numbers of Important Discoveries

Description
The numbers of “great” inventions and scientific discoveries in each year from 1860 to 1959.

Usage
discoveries
Format
A time series of 100 values.

Source
The World Almanac and Book of Facts, 1975 Edition, pages 315–318.

References
McNeil, D. R. (1977) Interactive Data Analysis. Wiley.

Examples
require(graphics)
plot(discoveries, ylab = "Number of important discoveries",
     las = 1)
title(main = "discoveries data set")

head(discoveries)

## [1] 5 3 0 2 0 3

example(discoveries)

## 
## dscvrs> require(graphics)
## 
## dscvrs> plot(discoveries, ylab = "Number of important discoveries",
## dscvrs+      las = 1)

## 
## dscvrs> title(main = "discoveries data set")

TOP

esoph Smoking, Alcohol and (O)esophageal Cancer

Description
Data from a case-control study of (o)esophageal cancer in Ille-et-Vilaine, France.

Usage
esoph
Format
A data frame with records for 88 age/alcohol/tobacco combinations.

[,1]    "agegp" Age group   1 25--34 years
2 35--44
3 45--54
4 55--64
5 65--74
6 75+
[,2]    "alcgp" Alcohol consumption 1 0--39 gm/day
2 40--79
3 80--119
4 120+
[,3]    "tobgp" Tobacco consumption 1 0-- 9 gm/day
2 10--19
3 20--29
4 30+
[,4]    "ncases"    Number of cases 
[,5]    "ncontrols" Number of controls  
Author(s)
Thomas Lumley

Source
Breslow, N. E. and Day, N. E. (1980) Statistical Methods in Cancer Research. Volume 1: The Analysis of Case-Control Studies. IARC Lyon / Oxford University Press.

Examples
require(stats)
require(graphics) # for mosaicplot
summary(esoph)
## effects of alcohol, tobacco and interaction, age-adjusted
model1 <- glm(cbind(ncases, ncontrols) ~ agegp + tobgp * alcgp,
              data = esoph, family = binomial())
anova(model1)
## Try a linear effect of alcohol and tobacco
model2 <- glm(cbind(ncases, ncontrols) ~ agegp + unclass(tobgp)
                                         + unclass(alcgp),
              data = esoph, family = binomial())
summary(model2)
## Re-arrange data for a mosaic plot
ttt <- table(esoph$agegp, esoph$alcgp, esoph$tobgp)
o <- with(esoph, order(tobgp, alcgp, agegp))
ttt[ttt == 1] <- esoph$ncases[o]
tt1 <- table(esoph$agegp, esoph$alcgp, esoph$tobgp)
tt1[tt1 == 1] <- esoph$ncontrols[o]
tt <- array(c(ttt, tt1), c(dim(ttt),2),
            c(dimnames(ttt), list(c("Cancer", "control"))))
mosaicplot(tt, main = "esoph data set", color = TRUE)

head(esoph)

example(esoph)

## 
## esoph> require(stats)
## 
## esoph> require(graphics) # for mosaicplot
## 
## esoph> summary(esoph)
##    agegp          alcgp         tobgp        ncases         ncontrols    
##  25-34:15   0-39g/day:23   0-9g/day:24   Min.   : 0.000   Min.   : 1.00  
##  35-44:15   40-79    :23   10-19   :24   1st Qu.: 0.000   1st Qu.: 3.00  
##  45-54:16   80-119   :21   20-29   :20   Median : 1.000   Median : 6.00  
##  55-64:16   120+     :21   30+     :20   Mean   : 2.273   Mean   :11.08  
##  65-74:15                                3rd Qu.: 4.000   3rd Qu.:14.00  
##  75+  :11                                Max.   :17.000   Max.   :60.00  
## 
## esoph> ## effects of alcohol, tobacco and interaction, age-adjusted
## esoph> model1 <- glm(cbind(ncases, ncontrols) ~ agegp + tobgp * alcgp,
## esoph+               data = esoph, family = binomial())
## 
## esoph> anova(model1)
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: cbind(ncases, ncontrols)
## 
## Terms added sequentially (first to last)
## 
## 
##             Df Deviance Resid. Df Resid. Dev
## NULL                           87    227.241
## agegp        5   88.128        82    139.112
## tobgp        3   19.085        79    120.028
## alcgp        3   66.054        76     53.973
## tobgp:alcgp  9    6.489        67     47.484
## 
## esoph> ## Try a linear effect of alcohol and tobacco
## esoph> model2 <- glm(cbind(ncases, ncontrols) ~ agegp + unclass(tobgp)
## esoph+                                          + unclass(alcgp),
## esoph+               data = esoph, family = binomial())
## 
## esoph> summary(model2)
## 
## Call:
## glm(formula = cbind(ncases, ncontrols) ~ agegp + unclass(tobgp) + 
##     unclass(alcgp), family = binomial(), data = esoph)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7628  -0.6426  -0.2709   0.3043   2.0421  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -4.01097    0.31224 -12.846  < 2e-16 ***
## agegp.L         2.96113    0.65092   4.549 5.39e-06 ***
## agegp.Q        -1.33735    0.58918  -2.270  0.02322 *  
## agegp.C         0.15292    0.44792   0.341  0.73281    
## agegp^4         0.06668    0.30776   0.217  0.82848    
## agegp^5        -0.20288    0.19523  -1.039  0.29872    
## unclass(tobgp)  0.26162    0.08198   3.191  0.00142 ** 
## unclass(alcgp)  0.65308    0.08452   7.727 1.10e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 227.241  on 87  degrees of freedom
## Residual deviance:  59.277  on 80  degrees of freedom
## AIC: 222.76
## 
## Number of Fisher Scoring iterations: 6
## 
## 
## esoph> ## Re-arrange data for a mosaic plot
## esoph> ttt <- table(esoph$agegp, esoph$alcgp, esoph$tobgp)
## 
## esoph> o <- with(esoph, order(tobgp, alcgp, agegp))
## 
## esoph> ttt[ttt == 1] <- esoph$ncases[o]
## 
## esoph> tt1 <- table(esoph$agegp, esoph$alcgp, esoph$tobgp)
## 
## esoph> tt1[tt1 == 1] <- esoph$ncontrols[o]
## 
## esoph> tt <- array(c(ttt, tt1), c(dim(ttt),2),
## esoph+             c(dimnames(ttt), list(c("Cancer", "control"))))
## 
## esoph> mosaicplot(tt, main = "esoph data set", color = TRUE)

TOP

euro Conversion Rates of Euro Currencies

Description
Conversion rates between the various Euro currencies.

Usage
euro
euro.cross
Format
euro is a named vector of length 11, euro.cross a matrix of size 11 by 11, with dimnames.

Details
The data set euro contains the value of 1 Euro in all currencies participating in the European monetary union (Austrian Schilling ATS, Belgian Franc BEF, German Mark DEM, Spanish Peseta ESP, Finnish Markka FIM, French Franc FRF, Irish Punt IEP, Italian Lira ITL, Luxembourg Franc LUF, Dutch Guilder NLG and Portuguese Escudo PTE). These conversion rates were fixed by the European Union on December 31, 1998. To convert old prices to Euro prices, divide by the respective rate and round to 2 digits.

The data set euro.cross contains conversion rates between the various Euro currencies, i.e., the result of outer(1 / euro, euro).

Examples
cbind(euro)

## These relations hold:
euro == signif(euro, 6) # [6 digit precision in Euro's definition]
all(euro.cross == outer(1/euro, euro))

## Convert 20 Euro to Belgian Franc
20 * euro["BEF"]
## Convert 20 Austrian Schilling to Euro
20 / euro["ATS"]
## Convert 20 Spanish Pesetas to Italian Lira
20 * euro.cross["ESP", "ITL"]

require(graphics)
dotchart(euro,
         main = "euro data: 1 Euro in currency unit")
dotchart(1/euro,
         main = "euro data: 1 currency unit in Euros")
dotchart(log(euro, 10),
         main = "euro data: log10(1 Euro in currency unit)")

head(euro)

##       ATS       BEF       DEM       ESP       FIM       FRF 
##  13.76030  40.33990   1.95583 166.38600   5.94573   6.55957

head(euro.cross)

##            ATS        BEF        DEM       ESP        FIM        FRF
## ATS 1.00000000  2.9316149 0.14213571 12.091742 0.43209305 0.47670254
## BEF 0.34110893  1.0000000 0.04848376  4.124601 0.14739080 0.16260749
## DEM 7.03552967 20.6254634 1.00000000 85.071811 3.04000348 3.35385489
## ESP 0.08270107  0.2424477 0.01175478  1.000000 0.03573456 0.03942381
## FIM 2.31431632  6.7846841 0.32894699 27.984116 1.00000000 1.10324048
## FRF 2.09774421  6.1497781 0.29816436 25.365382 0.90642070 1.00000000
##             IEP       ITL        LUF        NLG        PTE
## ATS 0.057234508 140.71423  2.9316149 0.16014985  14.569595
## BEF 0.019523202  47.99888  1.0000000 0.05462854   4.969819
## DEM 0.402675079 989.99913 20.6254634 1.12673903 102.504819
## ESP 0.004733355  11.63722  0.2424477 0.01324456   1.204921
## FIM 0.132458756 325.65724  6.7846841 0.37063742  33.718652
## FRF 0.120063358 295.18246  6.1497781 0.33595342  30.563284

example(euro)

## 
## euro> cbind(euro)
##            euro
## ATS   13.760300
## BEF   40.339900
## DEM    1.955830
## ESP  166.386000
## FIM    5.945730
## FRF    6.559570
## IEP    0.787564
## ITL 1936.270000
## LUF   40.339900
## NLG    2.203710
## PTE  200.482000
## 
## euro> ## These relations hold:
## euro> euro == signif(euro, 6) # [6 digit precision in Euro's definition]
##  ATS  BEF  DEM  ESP  FIM  FRF  IEP  ITL  LUF  NLG  PTE 
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 
## 
## euro> all(euro.cross == outer(1/euro, euro))
## [1] TRUE
## 
## euro> ## Convert 20 Euro to Belgian Franc
## euro> 20 * euro["BEF"]
##     BEF 
## 806.798 
## 
## euro> ## Convert 20 Austrian Schilling to Euro
## euro> 20 / euro["ATS"]
##      ATS 
## 1.453457 
## 
## euro> ## Convert 20 Spanish Pesetas to Italian Lira
## euro> 20 * euro.cross["ESP", "ITL"]
## [1] 232.7443
## 
## euro> require(graphics)
## 
## euro> dotchart(euro,
## euro+          main = "euro data: 1 Euro in currency unit")

## 
## euro> dotchart(1/euro,
## euro+          main = "euro data: 1 currency unit in Euros")

## 
## euro> dotchart(log(euro, 10),
## euro+          main = "euro data: log10(1 Euro in currency unit)")

TOP

eurodist Distances Between European Cities and Between US Cities

Description
The eurodist gives the road distances (in km) between 21 cities in Europe. The data are taken from a table in The Cambridge Encyclopaedia.

UScitiesD gives “straight line” distances between 10 cities in the US.

Usage
eurodist
UScitiesD
Format
dist objects based on 21 and 10 objects, respectively. (You must have the stats package loaded to have the methods for this kind of object available).

Source
Crystal, D. Ed. (1990) The Cambridge Encyclopaedia. Cambridge: Cambridge University Press,

The US cities distances were provided by Pierre Legendre.

head(eurodist)

## [1] 3313 2963 3175 3339 2762 3276

head(UScitiesD)

## [1]  587 1212  701 1936  604  748

TOP

faithful Old Faithful Geyser Data

Description
Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.

Usage
faithful
Format
A data frame with 272 observations on 2 variables.

[,1]    eruptions   numeric Eruption time in mins
[,2]    waiting numeric Waiting time to next eruption (in mins)
Details
A closer look at faithful$eruptions reveals that these are heavily rounded times originally in seconds, where multiples of 5 are more frequent than expected under non-human measurement. For a better version of the eruption times, see the example below.

There are many versions of this dataset around: Azzalini and Bowman (1990) use a more complete version.

Source
W. Härdle.

References
Härdle, W. (1991). Smoothing Techniques with Implementation in S. New York: Springer.

Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser. Applied Statistics, 39, 357–365. doi: 10.2307/2347385.

See Also
geyser in package MASS for the Azzalini–Bowman version.

Examples
require(stats); require(graphics)
f.tit <-  "faithful data: Eruptions of Old Faithful"

ne60 <- round(e60 <- 60 * faithful$eruptions)
all.equal(e60, ne60)             # relative diff. ~ 1/10000
table(zapsmall(abs(e60 - ne60))) # 0, 0.02 or 0.04
faithful$better.eruptions <- ne60 / 60
te <- table(ne60)
te[te >= 4]                      # (too) many multiples of 5 !
plot(names(te), te, type = "h", main = f.tit, xlab = "Eruption time (sec)")

plot(faithful[, -3], main = f.tit,
     xlab = "Eruption time (min)",
     ylab = "Waiting time to next eruption (min)")
lines(lowess(faithful$eruptions, faithful$waiting, f = 2/3, iter = 3),
      col = "red")

head(faithful)

example(faithful)

## 
## fathfl> require(stats); require(graphics)
## 
## fathfl> f.tit <-  "faithful data: Eruptions of Old Faithful"
## 
## fathfl> ne60 <- round(e60 <- 60 * faithful$eruptions)
## 
## fathfl> all.equal(e60, ne60)             # relative diff. ~ 1/10000
## [1] "Mean relative difference: 9.515332e-05"
## 
## fathfl> table(zapsmall(abs(e60 - ne60))) # 0, 0.02 or 0.04
## 
##    0 0.02 0.04 
##  106  163    3 
## 
## fathfl> faithful$better.eruptions <- ne60 / 60
## 
## fathfl> te <- table(ne60)
## 
## fathfl> te[te >= 4]                      # (too) many multiples of 5 !
## ne60
## 105 108 110 112 113 120 216 230 240 245 249 250 255 260 261 262 265 270 272 275 
##   6   4   7   8   4   4   4   5   6   5   4   4   4   5   4   4   4   8   5   4 
## 276 282 288 
##   4   6   6 
## 
## fathfl> plot(names(te), te, type = "h", main = f.tit, xlab = "Eruption time (sec)")

## 
## fathfl> plot(faithful[, -3], main = f.tit,
## fathfl+      xlab = "Eruption time (min)",
## fathfl+      ylab = "Waiting time to next eruption (min)")

## 
## fathfl> lines(lowess(faithful$eruptions, faithful$waiting, f = 2/3, iter = 3),
## fathfl+       col = "red")

TOP

freeny Freeny’s Revenue Data

Description
Freeny's data on quarterly revenue and explanatory variables.

Usage
freeny
freeny.x
freeny.y
Format
There are three ‘freeny’ data sets.

freeny.y is a time series with 39 observations on quarterly revenue from (1962,2Q) to (1971,4Q).

freeny.x is a matrix of explanatory variables. The columns are freeny.y lagged 1 quarter, price index, income level, and market potential.

Finally, freeny is a data frame with variables y, lag.quarterly.revenue, price.index, income.level, and market.potential obtained from the above two data objects.

Source
A. E. Freeny (1977) A Portable Linear Regression Package with Test Programs. Bell Laboratories memorandum.

References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Examples
require(stats); require(graphics)
summary(freeny)
pairs(freeny, main = "freeny data")
# gives warning: freeny$y has class "ts"

summary(fm1 <- lm(y ~ ., data = freeny))
opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
            mar = c(4.1, 4.1, 2.1, 1.1))
plot(fm1)
par(opar)

head(freeny)

head(freeny.x)

##      lag quarterly revenue price index income level market potential
## [1,]               8.79636     4.70997      5.82110          12.9699
## [2,]               8.79236     4.70217      5.82558          12.9733
## [3,]               8.79137     4.68944      5.83112          12.9774
## [4,]               8.81486     4.68558      5.84046          12.9806
## [5,]               8.81301     4.64019      5.85036          12.9831
## [6,]               8.90751     4.62553      5.86464          12.9854

head(freeny.y)

## [1] 8.79236 8.79137 8.81486 8.81301 8.90751 8.93673

example(freeny)

## 
## freeny> require(stats); require(graphics)
## 
## freeny> summary(freeny)
##        y         lag.quarterly.revenue  price.index     income.level  
##  Min.   :8.791   Min.   :8.791         Min.   :4.278   Min.   :5.821  
##  1st Qu.:9.045   1st Qu.:9.020         1st Qu.:4.392   1st Qu.:5.948  
##  Median :9.314   Median :9.284         Median :4.510   Median :6.061  
##  Mean   :9.306   Mean   :9.281         Mean   :4.496   Mean   :6.039  
##  3rd Qu.:9.591   3rd Qu.:9.561         3rd Qu.:4.605   3rd Qu.:6.139  
##  Max.   :9.794   Max.   :9.775         Max.   :4.710   Max.   :6.200  
##  market.potential
##  Min.   :12.97   
##  1st Qu.:13.01   
##  Median :13.07   
##  Mean   :13.07   
##  3rd Qu.:13.12   
##  Max.   :13.17   
## 
## freeny> pairs(freeny, main = "freeny data")

## 
## freeny> # gives warning: freeny$y has class "ts"
## freeny> 
## freeny> summary(fm1 <- lm(y ~ ., data = freeny))
## 
## Call:
## lm(formula = y ~ ., data = freeny)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0259426 -0.0101033  0.0003824  0.0103236  0.0267124 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -10.4726     6.0217  -1.739   0.0911 .  
## lag.quarterly.revenue   0.1239     0.1424   0.870   0.3904    
## price.index            -0.7542     0.1607  -4.693 4.28e-05 ***
## income.level            0.7675     0.1339   5.730 1.93e-06 ***
## market.potential        1.3306     0.5093   2.613   0.0133 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01473 on 34 degrees of freedom
## Multiple R-squared:  0.9981, Adjusted R-squared:  0.9978 
## F-statistic:  4354 on 4 and 34 DF,  p-value: < 2.2e-16
## 
## 
## freeny> opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
## freeny+             mar = c(4.1, 4.1, 2.1, 1.1))
## 
## freeny> plot(fm1)

## 
## freeny> par(opar)

TOP

infert Infertility after Spontaneous and Induced Abortion

Description
This is a matched case-control study dating from before the availability of conditional logistic regression.

Usage
infert
Format
1.  Education   0 = 0-5 years
1 = 6-11 years
2 = 12+ years
2.  age age in years of case
3.  parity  count
4.  number of prior 0 = 0
induced abortions   1 = 1
2 = 2 or more
5.  case status 1 = case
0 = control
6.  number of prior 0 = 0
spontaneous abortions   1 = 1
2 = 2 or more
7.  matched set number  1-83
8.  stratum number  1-63
Note
One case with two prior spontaneous abortions and two prior induced abortions is omitted.

Source
Trichopoulos et al (1976) Br. J. of Obst. and Gynaec. 83, 645–650.

Examples
require(stats)
model1 <- glm(case ~ spontaneous+induced, data = infert, family = binomial())
summary(model1)
## adjusted for other potential confounders:
summary(model2 <- glm(case ~ age+parity+education+spontaneous+induced,
                     data = infert, family = binomial()))
## Really should be analysed by conditional logistic regression
## which is in the survival package
if(require(survival)){
  model3 <- clogit(case ~ spontaneous+induced+strata(stratum), data = infert)
  print(summary(model3))
  detach()  # survival (conflicts)
}

head(infert)

example(infert)

## 
## infert> require(stats)
## 
## infert> model1 <- glm(case ~ spontaneous+induced, data = infert, family = binomial())
## 
## infert> summary(model1)
## 
## Call:
## glm(formula = case ~ spontaneous + induced, family = binomial(), 
##     data = infert)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6678  -0.8360  -0.5772   0.9030   1.9362  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.7079     0.2677  -6.380 1.78e-10 ***
## spontaneous   1.1972     0.2116   5.657 1.54e-08 ***
## induced       0.4181     0.2056   2.033    0.042 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 316.17  on 247  degrees of freedom
## Residual deviance: 279.61  on 245  degrees of freedom
## AIC: 285.61
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## infert> ## adjusted for other potential confounders:
## infert> summary(model2 <- glm(case ~ age+parity+education+spontaneous+induced,
## infert+                      data = infert, family = binomial()))
## 
## Call:
## glm(formula = case ~ age + parity + education + spontaneous + 
##     induced, family = binomial(), data = infert)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7603  -0.8162  -0.4956   0.8349   2.6536  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -1.14924    1.41220  -0.814   0.4158    
## age               0.03958    0.03120   1.269   0.2046    
## parity           -0.82828    0.19649  -4.215 2.49e-05 ***
## education6-11yrs -1.04424    0.79255  -1.318   0.1876    
## education12+ yrs -1.40321    0.83416  -1.682   0.0925 .  
## spontaneous       2.04591    0.31016   6.596 4.21e-11 ***
## induced           1.28876    0.30146   4.275 1.91e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 316.17  on 247  degrees of freedom
## Residual deviance: 257.80  on 241  degrees of freedom
## AIC: 271.8
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## infert> ## Really should be analysed by conditional logistic regression
## infert> ## which is in the survival package
## infert> ## No test: 
## infert> ##D if(require(survival)){
## infert> ##D   model3 <- clogit(case ~ spontaneous+induced+strata(stratum), data = infert)
## infert> ##D   print(summary(model3))
## infert> ##D   detach()  # survival (conflicts)
## infert> ##D }
## infert> ## End(No test)
## infert> 
## infert> 
## infert>

TOP

iris Edgar Anderson’s Iris Data

Description
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

Usage
iris
iris3
Format
iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.

iris3 gives the same data arranged as a 3-dimensional array of size 50 by 4 by 3, as represented by S-PLUS. The first dimension gives the case number within the species subsample, the second the measurements with names Sepal L., Sepal W., Petal L., and Petal W., and the third the species.

Source
Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, 179–188.

The data were collected by Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5.

References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole. (has iris3 as iris.)

See Also
matplot some examples of which use iris.

Examples
dni3 <- dimnames(iris3)
ii <- data.frame(matrix(aperm(iris3, c(1,3,2)), ncol = 4,
                        dimnames = list(NULL, sub(" L.",".Length",
                                        sub(" W.",".Width", dni3[[2]])))),
    Species = gl(3, 50, labels = sub("S", "s", sub("V", "v", dni3[[3]]))))
all.equal(ii, iris) # TRUE

head(iris)

head(iris3)

## [1] 5.1 4.9 4.7 4.6 5.0 5.4

example(iris)

## 
## iris> dni3 <- dimnames(iris3)
## 
## iris> ii <- data.frame(matrix(aperm(iris3, c(1,3,2)), ncol = 4,
## iris+                         dimnames = list(NULL, sub(" L.",".Length",
## iris+                                         sub(" W.",".Width", dni3[[2]])))),
## iris+     Species = gl(3, 50, labels = sub("S", "s", sub("V", "v", dni3[[3]]))))
## 
## iris> all.equal(ii, iris) # TRUE
## [1] TRUE

TOP

islands Areas of the World’s Major Landmasses

Description
The areas in thousands of square miles of the landmasses which exceed 10,000 square miles.

Usage
islands
Format
A named vector of length 48.

Source
The World Almanac and Book of Facts, 1975, page 406.

References
McNeil, D. R. (1977) Interactive Data Analysis. Wiley.

Examples
require(graphics)
dotchart(log(islands, 10),
   main = "islands data: log10(area) (log10(sq. miles))")
dotchart(log(islands[order(islands)], 10),
   main = "islands data: log10(area) (log10(sq. miles))")

head(islands)

##       Africa   Antarctica         Asia    Australia Axel Heiberg       Baffin 
##        11506         5500        16988         2968           16          184

example(islands)

## 
## islnds> require(graphics)
## 
## islnds> dotchart(log(islands, 10),
## islnds+    main = "islands data: log10(area) (log10(sq. miles))")

## 
## islnds> dotchart(log(islands[order(islands)], 10),
## islnds+    main = "islands data: log10(area) (log10(sq. miles))")

TOP

lh Luteinizing Hormone in Blood Samples

Description
A regular time series giving the luteinizing hormone in blood samples at 10 mins intervals from a human female, 48 samples.

Usage
lh
Source
P.J. Diggle (1990) Time Series: A Biostatistical Introduction. Oxford, table A.1, series 3

head(lh)

## [1] 2.4 2.4 2.4 2.2 2.1 1.5

TOP

longley Longley’s Economic Regression Data

Description
A macroeconomic data set which provides a well-known example for a highly collinear regression.

Usage
longley
Format
A data frame with 7 economical variables, observed yearly from 1947 to 1962 (n=16).

GNP.deflator
GNP implicit price deflator (1954=100)

GNP
Gross National Product.

Unemployed
number of unemployed.

Armed.Forces
number of people in the armed forces.

Population
‘noninstitutionalized’ population ≥ 14 years of age.

Year
the year (time).

Employed
number of people employed.

The regression lm(Employed ~ .) is known to be highly collinear.

Source
J. W. Longley (1967) An appraisal of least-squares programs from the point of view of the user. Journal of the American Statistical Association 62, 819–841.

References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Examples
require(stats); require(graphics)
## give the data set in the form it is used in S-PLUS:
longley.x <- data.matrix(longley[, 1:6])
longley.y <- longley[, "Employed"]
pairs(longley, main = "longley data")
summary(fm1 <- lm(Employed ~ ., data = longley))
opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
            mar = c(4.1, 4.1, 2.1, 1.1))
plot(fm1)
par(opar)

head(longley)

example(longley)

## 
## longly> require(stats); require(graphics)
## 
## longly> ## give the data set in the form it is used in S-PLUS:
## longly> longley.x <- data.matrix(longley[, 1:6])
## 
## longly> longley.y <- longley[, "Employed"]
## 
## longly> pairs(longley, main = "longley data")

## 
## longly> summary(fm1 <- lm(Employed ~ ., data = longley))
## 
## Call:
## lm(formula = Employed ~ ., data = longley)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.41011 -0.15767 -0.02816  0.10155  0.45539 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -3.482e+03  8.904e+02  -3.911 0.003560 ** 
## GNP.deflator  1.506e-02  8.492e-02   0.177 0.863141    
## GNP          -3.582e-02  3.349e-02  -1.070 0.312681    
## Unemployed   -2.020e-02  4.884e-03  -4.136 0.002535 ** 
## Armed.Forces -1.033e-02  2.143e-03  -4.822 0.000944 ***
## Population   -5.110e-02  2.261e-01  -0.226 0.826212    
## Year          1.829e+00  4.555e-01   4.016 0.003037 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3049 on 9 degrees of freedom
## Multiple R-squared:  0.9955, Adjusted R-squared:  0.9925 
## F-statistic: 330.3 on 6 and 9 DF,  p-value: 4.984e-10
## 
## 
## longly> opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
## longly+             mar = c(4.1, 4.1, 2.1, 1.1))
## 
## longly> plot(fm1)

## 
## longly> par(opar)

TOP

lynx Annual Canadian Lynx trappings 1821-1934

Description
Annual numbers of lynx trappings for 1821–1934 in Canada. Taken from Brockwell & Davis (1991), this appears to be the series considered by Campbell & Walker (1977).

Usage
lynx
Source
Brockwell, P. J. and Davis, R. A. (1991). Time Series and Forecasting Methods. Second edition. Springer. Series G (page 557).

References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.

Campbell, M. J. and Walker, A. M. (1977). A Survey of statistical work on the Mackenzie River series of annual Canadian lynx trappings for the years 1821–1934 and a new analysis. Journal of the Royal Statistical Society series A, 140, 411–431. doi: 10.2307/2345277.

head(lynx)

## [1]  269  321  585  871 1475 2821

TOP

morley Michelson Speed of Light Data

Description
A classical data of Michelson (but not this one with Morley) on measurements done in 1879 on the speed of light. The data consists of five experiments, each consisting of 20 consecutive ‘runs’. The response is the speed of light measurement, suitably coded (km/sec, with 299000 subtracted).

Usage
morley
Format
A data frame with 100 observations on the following 3 variables.

Expt
The experiment number, from 1 to 5.

Run
The run number within each experiment.

Speed
Speed-of-light measurement.

Details
The data is here viewed as a randomized block experiment with ‘experiment’ and ‘run’ as the factors. ‘run’ may also be considered a quantitative variate to account for linear (or polynomial) changes in the measurement over the course of a single experiment.

Note
This is the same dataset as michelson in package MASS.

Source
A. J. Weekes (1986) A Genstat Primer. London: Edward Arnold.

S. M. Stigler (1977) Do robust estimators work with real data? Annals of Statistics 5, 1055–1098. (See Table 6.)

A. A. Michelson (1882) Experimental determination of the velocity of light made at the United States Naval Academy, Annapolis. Astronomic Papers 1 135–8. U.S. Nautical Almanac Office. (See Table 24.)

Examples
require(stats); require(graphics)
michelson <- transform(morley,
                       Expt = factor(Expt), Run = factor(Run))
xtabs(~ Expt + Run, data = michelson)  # 5 x 20 balanced (two-way)
plot(Speed ~ Expt, data = michelson,
     main = "Speed of Light Data", xlab = "Experiment No.")
fm <- aov(Speed ~ Run + Expt, data = michelson)
summary(fm)
fm0 <- update(fm, . ~ . - Run)
anova(fm0, fm)

head(morley)

example(morley)

## 
## morley> require(stats); require(graphics)
## 
## morley> michelson <- transform(morley,
## morley+                        Expt = factor(Expt), Run = factor(Run))
## 
## morley> xtabs(~ Expt + Run, data = michelson)  # 5 x 20 balanced (two-way)
##     Run
## Expt 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
##    1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1
##    2 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1
##    3 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1
##    4 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1
##    5 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1
## 
## morley> plot(Speed ~ Expt, data = michelson,
## morley+      main = "Speed of Light Data", xlab = "Experiment No.")

## 
## morley> fm <- aov(Speed ~ Run + Expt, data = michelson)
## 
## morley> summary(fm)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Run         19 113344    5965   1.105 0.36321   
## Expt         4  94514   23629   4.378 0.00307 **
## Residuals   76 410166    5397                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## morley> fm0 <- update(fm, . ~ . - Run)
## 
## morley> anova(fm0, fm)
## Analysis of Variance Table
## 
## Model 1: Speed ~ Expt
## Model 2: Speed ~ Run + Expt
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     95 523510                           
## 2     76 410166 19    113344 1.1053 0.3632

TOP

mtcars Motor Trend Car Road Tests

Description
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

Usage
mtcars
Format
A data frame with 32 observations on 11 (numeric) variables.

[, 1]   mpg Miles/(US) gallon
[, 2]   cyl Number of cylinders
[, 3]   disp    Displacement (cu.in.)
[, 4]   hp  Gross horsepower
[, 5]   drat    Rear axle ratio
[, 6]   wt  Weight (1000 lbs)
[, 7]   qsec    1/4 mile time
[, 8]   vs  Engine (0 = V-shaped, 1 = straight)
[, 9]   am  Transmission (0 = automatic, 1 = manual)
[,10]   gear    Number of forward gears
[,11]   carb    Number of carburetors
Note
Henderson and Velleman (1981) comment in a footnote to Table 1: ‘Hocking [original transcriber]'s noncrucial coding of the Mazda's rotary engine as a straight six-cylinder engine and the Porsche's flat engine as a V engine, as well as the inclusion of the diesel Mercedes 240D, have been retained to enable direct comparisons to be made with previous analyses.’

Source
Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.

Examples
require(graphics)
pairs(mtcars, main = "mtcars data", gap = 1/4)
coplot(mpg ~ disp | as.factor(cyl), data = mtcars,
       panel = panel.smooth, rows = 1)
## possibly more meaningful, e.g., for summary() or bivariate plots:
mtcars2 <- within(mtcars, {
   vs <- factor(vs, labels = c("V", "S"))
   am <- factor(am, labels = c("automatic", "manual"))
   cyl  <- ordered(cyl)
   gear <- ordered(gear)
   carb <- ordered(carb)
})
summary(mtcars2)

head(mtcars)

example(mtcars)

## 
## mtcars> require(graphics)
## 
## mtcars> pairs(mtcars, main = "mtcars data", gap = 1/4)

## 
## mtcars> coplot(mpg ~ disp | as.factor(cyl), data = mtcars,
## mtcars+        panel = panel.smooth, rows = 1)

## 
## mtcars> ## possibly more meaningful, e.g., for summary() or bivariate plots:
## mtcars> mtcars2 <- within(mtcars, {
## mtcars+    vs <- factor(vs, labels = c("V", "S"))
## mtcars+    am <- factor(am, labels = c("automatic", "manual"))
## mtcars+    cyl  <- ordered(cyl)
## mtcars+    gear <- ordered(gear)
## mtcars+    carb <- ordered(carb)
## mtcars+ })
## 
## mtcars> summary(mtcars2)
##       mpg        cyl         disp             hp             drat      
##  Min.   :10.40   4:11   Min.   : 71.1   Min.   : 52.0   Min.   :2.760  
##  1st Qu.:15.43   6: 7   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.080  
##  Median :19.20   8:14   Median :196.3   Median :123.0   Median :3.695  
##  Mean   :20.09          Mean   :230.7   Mean   :146.7   Mean   :3.597  
##  3rd Qu.:22.80          3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.920  
##  Max.   :33.90          Max.   :472.0   Max.   :335.0   Max.   :4.930  
##        wt             qsec       vs             am     gear   carb  
##  Min.   :1.513   Min.   :14.50   V:18   automatic:19   3:15   1: 7  
##  1st Qu.:2.581   1st Qu.:16.89   S:14   manual   :13   4:12   2:10  
##  Median :3.325   Median :17.71                         5: 5   3: 3  
##  Mean   :3.217   Mean   :17.85                                4:10  
##  3rd Qu.:3.610   3rd Qu.:18.90                                6: 1  
##  Max.   :5.424   Max.   :22.90                                8: 1

TOP

nhtemp Average Yearly Temperatures in New Haven

Description
The mean annual temperature in degrees Fahrenheit in New Haven, Connecticut, from 1912 to 1971.

Usage
nhtemp
Format
A time series of 60 observations.

Source
Vaux, J. E. and Brinker, N. B. (1972) Cycles, 1972, 117–121.

References
McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

Examples
require(stats); require(graphics)
plot(nhtemp, main = "nhtemp data",
  ylab = "Mean annual temperature in New Haven, CT (deg. F)")

head(nhtemp)

## [1] 49.9 52.3 49.4 51.1 49.4 47.9

example(nhtemp)

## 
## nhtemp> require(stats); require(graphics)
## 
## nhtemp> plot(nhtemp, main = "nhtemp data",
## nhtemp+   ylab = "Mean annual temperature in New Haven, CT (deg. F)")

TOP

nottem Average Monthly Temperatures at Nottingham, 1920-1939

Description
A time series object containing average air temperatures at Nottingham Castle in degrees Fahrenheit for 20 years.

Usage
nottem
Source
Anderson, O. D. (1976) Time Series Analysis and Forecasting: The Box-Jenkins approach. Butterworths. Series R.

Examples
require(stats); require(graphics)
nott <- window(nottem, end = c(1936,12))
fit <- arima(nott, order = c(1,0,0), list(order = c(2,1,0), period = 12))
nott.fore <- predict(fit, n.ahead = 36)
ts.plot(nott, nott.fore$pred, nott.fore$pred+2*nott.fore$se,
        nott.fore$pred-2*nott.fore$se, gpars = list(col = c(1,1,4,4)))

head(nottem)

## [1] 40.6 40.8 44.4 46.7 54.1 58.5

example(nottem)

## 
## nottem> ## No test: 
## nottem> ##D require(stats); require(graphics)
## nottem> ##D nott <- window(nottem, end = c(1936,12))
## nottem> ##D fit <- arima(nott, order = c(1,0,0), list(order = c(2,1,0), period = 12))
## nottem> ##D nott.fore <- predict(fit, n.ahead = 36)
## nottem> ##D ts.plot(nott, nott.fore$pred, nott.fore$pred+2*nott.fore$se,
## nottem> ##D         nott.fore$pred-2*nott.fore$se, gpars = list(col = c(1,1,4,4)))
## nottem> ## End(No test)
## nottem> 
## nottem>

TOP

npk Classical N, P, K Factorial Experiment

Description
A classical N, P, K (nitrogen, phosphate, potassium) factorial experiment on the growth of peas conducted on 6 blocks. Each half of a fractional factorial design confounding the NPK interaction was used on 3 of the plots.

Usage
npk
Format
The npk data frame has 24 rows and 5 columns:

block
which block (label 1 to 6).

N
indicator (0/1) for the application of nitrogen.

P
indicator (0/1) for the application of phosphate.

K
indicator (0/1) for the application of potassium.

yield
Yield of peas, in pounds/plot (the plots were (1/70) acre).

Source
Imperial College, London, M.Sc. exercise sheet.

References
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Examples
options(contrasts = c("contr.sum", "contr.poly"))
npk.aov <- aov(yield ~ block + N*P*K, npk)
npk.aov
summary(npk.aov)
coef(npk.aov)
options(contrasts = c("contr.treatment", "contr.poly"))
npk.aov1 <- aov(yield ~ block + N + K, data = npk)
summary.lm(npk.aov1)
se.contrast(npk.aov1, list(N=="0", N=="1"), data = npk)
model.tables(npk.aov1, type = "means", se = TRUE)

head(npk)

example(npk)

## 
## npk> ## No test: 
## npk> ##D options(contrasts = c("contr.sum", "contr.poly"))
## npk> ##D npk.aov <- aov(yield ~ block + N*P*K, npk)
## npk> ##D npk.aov
## npk> ##D summary(npk.aov)
## npk> ##D coef(npk.aov)
## npk> ##D options(contrasts = c("contr.treatment", "contr.poly"))
## npk> ##D npk.aov1 <- aov(yield ~ block + N + K, data = npk)
## npk> ##D summary.lm(npk.aov1)
## npk> ##D se.contrast(npk.aov1, list(N=="0", N=="1"), data = npk)
## npk> ##D model.tables(npk.aov1, type = "means", se = TRUE)
## npk> ## End(No test)
## npk> 
## npk>

TOP

occupationalStatus Occupational Status of Fathers and their Sons

Description
Cross-classification of a sample of British males according to each subject's occupational status and his father's occupational status.

Usage
occupationalStatus
Format
A table of counts, with classifying factors origin (father's occupational status; levels 1:8) and destination (son's occupational status; levels 1:8).

Source
Goodman, L. A. (1979) Simple Models for the Analysis of Association in Cross-Classifications having Ordered Categories. J. Am. Stat. Assoc., 74 (367), 537–552.

The data set has been in package gnm and been provided by the package authors.

Examples
require(stats); require(graphics)

plot(occupationalStatus)

##  Fit a uniform association model separating diagonal effects
Diag <- as.factor(diag(1:8))
Rscore <- scale(as.numeric(row(occupationalStatus)), scale = FALSE)
Cscore <- scale(as.numeric(col(occupationalStatus)), scale = FALSE)
modUnif <- glm(Freq ~ origin + destination + Diag + Rscore:Cscore,
               family = poisson, data = occupationalStatus)

summary(modUnif)
plot(modUnif) # 4 plots, with warning about  h_ii ~= 1

head(occupationalStatus)

##       destination
## origin   1   2   3   4   5   6   7   8
##      1  50  19  26   8   7  11   6   2
##      2  16  40  34  18  11  20   8   3
##      3  12  35  65  66  35  88  23  21
##      4  11  20  58 110  40 183  64  32
##      5   2   8  12  23  25  46  28  12
##      6  12  28 102 162  90 554 230 177

example(occupationalStatus)

## 
## occptS> require(stats); require(graphics)
## 
## occptS> plot(occupationalStatus)

## 
## occptS> ##  Fit a uniform association model separating diagonal effects
## occptS> Diag <- as.factor(diag(1:8))
## 
## occptS> Rscore <- scale(as.numeric(row(occupationalStatus)), scale = FALSE)
## 
## occptS> Cscore <- scale(as.numeric(col(occupationalStatus)), scale = FALSE)
## 
## occptS> modUnif <- glm(Freq ~ origin + destination + Diag + Rscore:Cscore,
## occptS+                family = poisson, data = occupationalStatus)
## 
## occptS> summary(modUnif)
## 
## Call:
## glm(formula = Freq ~ origin + destination + Diag + Rscore:Cscore, 
##     family = poisson, data = occupationalStatus)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6521  -0.6267   0.0000   0.5913   2.0964  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   0.568592   0.183358   3.101 0.001929 ** 
## origin2       0.431314   0.149415   2.887 0.003893 ** 
## origin3       1.461862   0.131141  11.147  < 2e-16 ***
## origin4       1.788731   0.126588  14.130  < 2e-16 ***
## origin5       0.441011   0.144844   3.045 0.002329 ** 
## origin6       2.491735   0.121219  20.556  < 2e-16 ***
## origin7       1.127536   0.129032   8.738  < 2e-16 ***
## origin8       0.796445   0.131863   6.040 1.54e-09 ***
## destination2  0.873202   0.166902   5.232 1.68e-07 ***
## destination3  1.813992   0.153861  11.790  < 2e-16 ***
## destination4  2.082515   0.150887  13.802  < 2e-16 ***
## destination5  1.366383   0.155590   8.782  < 2e-16 ***
## destination6  2.816369   0.146054  19.283  < 2e-16 ***
## destination7  1.903918   0.147810  12.881  < 2e-16 ***
## destination8  1.398585   0.151658   9.222  < 2e-16 ***
## Diag1         1.665495   0.237383   7.016 2.28e-12 ***
## Diag2         0.959681   0.212122   4.524 6.06e-06 ***
## Diag3         0.021750   0.156530   0.139 0.889490    
## Diag4         0.226399   0.124364   1.820 0.068689 .  
## Diag5         0.808646   0.229754   3.520 0.000432 ***
## Diag6         0.132277   0.077191   1.714 0.086597 .  
## Diag7         0.506709   0.115936   4.371 1.24e-05 ***
## Diag8         0.221880   0.134803   1.646 0.099771 .  
## Rscore:Cscore 0.136974   0.007489  18.289  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 4679.004  on 63  degrees of freedom
## Residual deviance:   58.436  on 40  degrees of freedom
## AIC: 428.78
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## occptS> plot(modUnif) # 4 plots, with warning about  h_ii ~= 1

## Warning: not plotting observations with leverage one:
##   1, 10, 19, 28, 37, 46, 55, 64

## Warning: not plotting observations with leverage one:
##   1, 10, 19, 28, 37, 46, 55, 64

TOP

precip Annual Precipitation in US Cities

Description
The average amount of precipitation (rainfall) in inches for each of 70 United States (and Puerto Rico) cities.

Usage
precip
Format
A named vector of length 70.

Note
The dataset version up to Nov.16, 2016 had a typo in "Cincinnati"'s name. The examples show how to recreate that version.

Source
Statistical Abstracts of the United States, 1975.

References
McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

Examples
require(graphics)
dotchart(precip[order(precip)], main = "precip data")
title(sub = "Average annual precipitation (in.)")

## Old ("wrong") version of dataset (just name change):
precip.O <- local({
   p <- precip; names(p)[names(p) == "Cincinnati"] <- "Cincinati" ; p })
stopifnot(all(precip == precip.O),
      match("Cincinnati", names(precip)) == 46,
      identical(names(precip)[-46], names(precip.O)[-46]))

head(precip)

##      Mobile      Juneau     Phoenix Little Rock Los Angeles  Sacramento 
##        67.0        54.7         7.0        48.5        14.0        17.2

example(precip)

## 
## precip> require(graphics)
## 
## precip> dotchart(precip[order(precip)], main = "precip data")

## 
## precip> title(sub = "Average annual precipitation (in.)")
## 
## precip> ## Old ("wrong") version of dataset (just name change):
## precip> precip.O <- local({
## precip+    p <- precip; names(p)[names(p) == "Cincinnati"] <- "Cincinati" ; p })
## 
## precip> stopifnot(all(precip == precip.O),
## precip+    match("Cincinnati", names(precip)) == 46,
## precip+    identical(names(precip)[-46], names(precip.O)[-46]))

TOP

presidents Quarterly Approval Ratings of US Presidents

Description
The (approximately) quarterly approval rating for the President of the United States from the first quarter of 1945 to the last quarter of 1974.

Usage
presidents
Format
A time series of 120 values.

Details
The data are actually a fudged version of the approval ratings. See McNeil's book for details.

Source
The Gallup Organisation.

References
McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

Examples
require(stats); require(graphics)
plot(presidents, las = 1, ylab = "Approval rating (%)",
     main = "presidents data")

head(presidents)

## [1] NA 87 82 75 63 50

example(presidents)

## 
## prsdnt> require(stats); require(graphics)
## 
## prsdnt> plot(presidents, las = 1, ylab = "Approval rating (%)",
## prsdnt+      main = "presidents data")

TOP

pressure Vapor Pressure of Mercury as a Function of Temperature

Description
Data on the relation between temperature in degrees Celsius and vapor pressure of mercury in millimeters (of mercury).

Usage
pressure
Format
A data frame with 19 observations on 2 variables.

[, 1]   temperature numeric temperature (deg C)
[, 2]   pressure    numeric pressure (mm)
Source
Weast, R. C., ed. (1973) Handbook of Chemistry and Physics. CRC Press.

References
McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

Examples
require(graphics)
plot(pressure, xlab = "Temperature (deg C)",
     ylab = "Pressure (mm of Hg)",
     main = "pressure data: Vapor Pressure of Mercury")
plot(pressure, xlab = "Temperature (deg C)",  log = "y",
     ylab = "Pressure (mm of Hg)",
     main = "pressure data: Vapor Pressure of Mercury")

head(pressure)

example(pressure)

## 
## pressr> require(graphics)
## 
## pressr> plot(pressure, xlab = "Temperature (deg C)",
## pressr+      ylab = "Pressure (mm of Hg)",
## pressr+      main = "pressure data: Vapor Pressure of Mercury")

## 
## pressr> plot(pressure, xlab = "Temperature (deg C)",  log = "y",
## pressr+      ylab = "Pressure (mm of Hg)",
## pressr+      main = "pressure data: Vapor Pressure of Mercury")

TOP

quakes Locations of Earthquakes off Fiji

Description
The data set give the locations of 1000 seismic events of MB > 4.0. The events occurred in a cube near Fiji since 1964.

Usage
quakes
Format
A data frame with 1000 observations on 5 variables.

[,1]    lat numeric Latitude of event
[,2]    long    numeric Longitude
[,3]    depth   numeric Depth (km)
[,4]    mag numeric Richter Magnitude
[,5]    stations    numeric Number of stations reporting
Details
There are two clear planes of seismic activity. One is a major plate junction; the other is the Tonga trench off New Zealand. These data constitute a subsample from a larger dataset of containing 5000 observations.

Source
This is one of the Harvard PRIM-H project data sets. They in turn obtained it from Dr. John Woodhouse, Dept. of Geophysics, Harvard University.

Examples
require(graphics)
pairs(quakes, main = "Fiji Earthquakes, N = 1000", cex.main = 1.2, pch = ".")

head(quakes)

example(quakes)

## 
## quakes> require(graphics)
## 
## quakes> pairs(quakes, main = "Fiji Earthquakes, N = 1000", cex.main = 1.2, pch = ".")

TOP

randu Random Numbers from Congruential Generator RANDU

Description
400 triples of successive random numbers were taken from the VAX FORTRAN function RANDU running under VMS 1.5.

Usage
randu
Format
A data frame with 400 observations on 3 variables named x, y and z which give the first, second and third random number in the triple.

Details
In three dimensional displays it is evident that the triples fall on 15 parallel planes in 3-space. This can be shown theoretically to be true for all triples from the RANDU generator.

These particular 400 triples start 5 apart in the sequence, that is they are ((U[5i+1], U[5i+2], U[5i+3]), i= 0, ..., 399), and they are rounded to 6 decimal places.

Under VMS versions 2.0 and higher, this problem has been fixed.

Source
David Donoho

Examples
## We could re-generate the dataset by the following R code
seed <- as.double(1)
RANDU <- function() {
    seed <<- ((2^16 + 3) * seed) %% (2^31)
    seed/(2^31)
}
for(i in 1:400) {
    U <- c(RANDU(), RANDU(), RANDU(), RANDU(), RANDU())
    print(round(U[1:3], 6))
}

head(randu)

example(randu)

## 
## randu> ## No test: 
## randu> ##D ## We could re-generate the dataset by the following R code
## randu> ##D seed <- as.double(1)
## randu> ##D RANDU <- function() {
## randu> ##D     seed <<- ((2^16 + 3) * seed) %% (2^31)
## randu> ##D     seed/(2^31)
## randu> ##D }
## randu> ##D for(i in 1:400) {
## randu> ##D     U <- c(RANDU(), RANDU(), RANDU(), RANDU(), RANDU())
## randu> ##D     print(round(U[1:3], 6))
## randu> ##D }
## randu> ## End(No test)
## randu> 
## randu> 
## randu>

TOP

rivers Lengths of Major North American Rivers

Description
This data set gives the lengths (in miles) of 141 “major” rivers in North America, as compiled by the US Geological Survey.

Usage
rivers
Format
A vector containing 141 observations.

Source
World Almanac and Book of Facts, 1975, page 406.

References
McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

head(rivers)

## [1] 735 320 325 392 524 450

TOP

rock Measurements on Petroleum Rock Samples

Description
Measurements on 48 rock samples from a petroleum reservoir.

Usage
rock
Format
A data frame with 48 rows and 4 numeric columns.

[,1]    area    area of pores space, in pixels out of 256 by 256
[,2]    peri    perimeter in pixels
[,3]    shape   perimeter/sqrt(area)
[,4]    perm    permeability in milli-Darcies
Details
Twelve core samples from petroleum reservoirs were sampled by 4 cross-sections. Each core sample was measured for permeability, and each cross-section has total area of pores, total perimeter of pores, and shape.

Source
Data from BP Research, image analysis by Ronit Katz, U. Oxford.

head(rock)

TOP

sleep Student’s Sleep Data

Description
Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients.

Usage
sleep
Format
A data frame with 20 observations on 3 variables.

[, 1]   extra   numeric increase in hours of sleep
[, 2]   group   factor  drug given
[, 3]   ID  factor  patient ID
Details
The group variable name may be misleading about the data: They represent measurements on 10 persons, not in groups.

Source
Cushny, A. R. and Peebles, A. R. (1905) The action of optical isomers: II hyoscines. The Journal of Physiology 32, 501–510.

Student (1908) The probable error of the mean. Biometrika, 6, 20.

References
Scheffé, Henry (1959) The Analysis of Variance. New York, NY: Wiley.

Examples
require(stats)
## Student's paired t-test
with(sleep,
     t.test(extra[group == 1],
            extra[group == 2], paired = TRUE))

## The sleep *prolongations*
sleep1 <- with(sleep, extra[group == 2] - extra[group == 1])
summary(sleep1)
stripchart(sleep1, method = "stack", xlab = "hours",
           main = "Sleep prolongation (n = 10)")
boxplot(sleep1, horizontal = TRUE, add = TRUE,
        at = .6, pars = list(boxwex = 0.5, staplewex = 0.25))

head(sleep)

example(sleep)

## 
## sleep> require(stats)
## 
## sleep> ## Student's paired t-test
## sleep> with(sleep,
## sleep+      t.test(extra[group == 1],
## sleep+             extra[group == 2], paired = TRUE))
## 
##  Paired t-test
## 
## data:  extra[group == 1] and extra[group == 2]
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean of the differences 
##                   -1.58 
## 
## 
## sleep> ## The sleep *prolongations*
## sleep> sleep1 <- with(sleep, extra[group == 2] - extra[group == 1])
## 
## sleep> summary(sleep1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    1.05    1.30    1.58    1.70    4.60 
## 
## sleep> stripchart(sleep1, method = "stack", xlab = "hours",
## sleep+            main = "Sleep prolongation (n = 10)")

## 
## sleep> boxplot(sleep1, horizontal = TRUE, add = TRUE,
## sleep+         at = .6, pars = list(boxwex = 0.5, staplewex = 0.25))

TOP

stackloss Brownlee’s Stack Loss Plant Data

Description
Operational data of a plant for the oxidation of ammonia to nitric acid.

Usage
stackloss

stack.x
stack.loss
Format
stackloss is a data frame with 21 observations on 4 variables.

[,1]    Air Flow    Flow of cooling air
[,2]    Water Temp  Cooling Water Inlet Temperature
[,3]    Acid Conc.  Concentration of acid [per 1000, minus 500]
[,4]    stack.loss  Stack loss
For compatibility with S-PLUS, the data sets stack.x, a matrix with the first three (independent) variables of the data frame, and stack.loss, the numeric vector giving the fourth (dependent) variable, are provided as well.

Details
“Obtained from 21 days of operation of a plant for the oxidation of ammonia (NH3) to nitric acid (HNO3). The nitric oxides produced are absorbed in a countercurrent absorption tower”. (Brownlee, cited by Dodge, slightly reformatted by MM.)

Air Flow represents the rate of operation of the plant. Water Temp is the temperature of cooling water circulated through coils in the absorption tower. Acid Conc. is the concentration of the acid circulating, minus 50, times 10: that is, 89 corresponds to 58.9 per cent acid. stack.loss (the dependent variable) is 10 times the percentage of the ingoing ammonia to the plant that escapes from the absorption column unabsorbed; that is, an (inverse) measure of the over-all efficiency of the plant.

Source
Brownlee, K. A. (1960, 2nd ed. 1965) Statistical Theory and Methodology in Science and Engineering. New York: Wiley. pp. 491–500.

References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Dodge, Y. (1996) The guinea pig of multiple regression. In: Robust Statistics, Data Analysis, and Computer Intensive Methods; In Honor of Peter Huber's 60th Birthday, 1996, Lecture Notes in Statistics 109, Springer-Verlag, New York.

Examples
require(stats)
summary(lm.stack <- lm(stack.loss ~ stack.x))

head(stackloss)

head(stack.x)

##      Air.Flow Water.Temp Acid.Conc.
## [1,]       80         27         89
## [2,]       80         27         88
## [3,]       75         25         90
## [4,]       62         24         87
## [5,]       62         22         87
## [6,]       62         23         87

head(stack.loss)

## [1] 42 37 37 28 18 18

example(stackloss)

## 
## stckls> require(stats)
## 
## stckls> summary(lm.stack <- lm(stack.loss ~ stack.x))
## 
## Call:
## lm(formula = stack.loss ~ stack.x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.2377 -1.7117 -0.4551  2.3614  5.6978 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -39.9197    11.8960  -3.356  0.00375 ** 
## stack.xAir.Flow     0.7156     0.1349   5.307  5.8e-05 ***
## stack.xWater.Temp   1.2953     0.3680   3.520  0.00263 ** 
## stack.xAcid.Conc.  -0.1521     0.1563  -0.973  0.34405    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.243 on 17 degrees of freedom
## Multiple R-squared:  0.9136, Adjusted R-squared:  0.8983 
## F-statistic:  59.9 on 3 and 17 DF,  p-value: 3.016e-09

TOP

state US State Facts and Figures

Description
Data sets related to the 50 states of the United States of America.

Usage
state.abb
state.area
state.center
state.division
state.name
state.region
state.x77
Details
R currently contains the following “state” data sets. Note that all data are arranged according to alphabetical order of the state names.

state.abb:
character vector of 2-letter abbreviations for the state names.

state.area:
numeric vector of state areas (in square miles).

state.center:
list with components named x and y giving the approximate geographic center of each state in negative longitude and latitude. Alaska and Hawaii are placed just off the West Coast.

state.division:
factor giving state divisions (New England, Middle Atlantic, South Atlantic, East South Central, West South Central, East North Central, West North Central, Mountain, and Pacific).

state.name:
character vector giving the full state names.

state.region:
factor giving the region (Northeast, South, North Central, West) that each state belongs to.

state.x77:
matrix with 50 rows and 8 columns giving the following statistics in the respective columns.

Population:
population estimate as of July 1, 1975

Income:
per capita income (1974)

Illiteracy:
illiteracy (1970, percent of population)

Life Exp:
life expectancy in years (1969–71)

Murder:
murder and non-negligent manslaughter rate per 100,000 population (1976)

HS Grad:
percent high-school graduates (1970)

Frost:
mean number of days with minimum temperature below freezing (1931–1960) in capital or large city

Area:
land area in square miles

Source
U.S. Department of Commerce, Bureau of the Census (1977) Statistical Abstract of the United States.

U.S. Department of Commerce, Bureau of the Census (1977) County and City Data Book.

References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

head(state.abb)

## [1] "AL" "AK" "AZ" "AR" "CA" "CO"

head(state.area)

## [1]  51609 589757 113909  53104 158693 104247

head(state.center)

## $x
##  [1]  -86.7509 -127.2500 -111.6250  -92.2992 -119.7730 -105.5130  -72.3573
##  [8]  -74.9841  -81.6850  -83.3736 -126.2500 -113.9300  -89.3776  -86.0808
## [15]  -93.3714  -98.1156  -84.7674  -92.2724  -68.9801  -76.6459  -71.5800
## [22]  -84.6870  -94.6043  -89.8065  -92.5137 -109.3200  -99.5898 -116.8510
## [29]  -71.3924  -74.2336 -105.9420  -75.1449  -78.4686 -100.0990  -82.5963
## [36]  -97.1239 -120.0680  -77.4500  -71.1244  -80.5056  -99.7238  -86.4560
## [43]  -98.7857 -111.3300  -72.5450  -78.2005 -119.7460  -80.6665  -89.9941
## [50] -107.2560
## 
## $y
##  [1] 32.5901 49.2500 34.2192 34.7336 36.5341 38.6777 41.5928 38.6777 27.8744
## [10] 32.3329 31.7500 43.5648 40.0495 40.0495 41.9358 38.4204 37.3915 30.6181
## [19] 45.6226 39.2778 42.3645 43.1361 46.3943 32.6758 38.3347 46.8230 41.3356
## [28] 39.1063 43.3934 39.9637 34.4764 43.1361 35.4195 47.2517 40.2210 35.5053
## [37] 43.9078 40.9069 41.5928 33.6190 44.3365 35.6767 31.3897 39.1063 44.2508
## [46] 37.5630 47.4231 38.4204 44.5937 43.0504

head(state.division)

## [1] East South Central Pacific            Mountain           West South Central
## [5] Pacific            Mountain          
## 9 Levels: New England Middle Atlantic South Atlantic ... Pacific

head(state.name)

## [1] "Alabama"    "Alaska"     "Arizona"    "Arkansas"   "California"
## [6] "Colorado"

head(state.region)

## [1] South West  West  South West  West 
## Levels: Northeast South North Central West

head(state.x77)

##            Population Income Illiteracy Life Exp Murder HS Grad Frost   Area
## Alabama          3615   3624        2.1    69.05   15.1    41.3    20  50708
## Alaska            365   6315        1.5    69.31   11.3    66.7   152 566432
## Arizona          2212   4530        1.8    70.55    7.8    58.1    15 113417
## Arkansas         2110   3378        1.9    70.66   10.1    39.9    65  51945
## California      21198   5114        1.1    71.71   10.3    62.6    20 156361
## Colorado         2541   4884        0.7    72.06    6.8    63.9   166 103766

example(state)

## Warning in example(state): 'state' has a help file but no examples

TOP

sunspot.month Monthly Sunspot Data, from 1749 to “Present”

Description
Monthly numbers of sunspots, as from the World Data Center, aka SIDC. This is the version of the data that will occasionally be updated when new counts become available.

Usage
sunspot.month
Format
The univariate time series sunspot.year and sunspot.month contain 289 and 2988 observations, respectively. The objects are of class "ts".

Author(s)
R

Source
WDC-SILSO, Solar Influences Data Analysis Center (SIDC), Royal Observatory of Belgium, Av. Circulaire, 3, B-1180 BRUSSELS Currently at http://www.sidc.be/silso/datafiles

See Also
sunspot.month is a longer version of sunspots; the latter runs until 1983 and is kept fixed (for reproducibility as example dataset).

Examples
require(stats); require(graphics)
## Compare the monthly series
plot (sunspot.month,
      main="sunspot.month & sunspots [package'datasets']", col=2)
lines(sunspots) # -> faint differences where they overlap

## Now look at the difference :
all(tsp(sunspots)     [c(1,3)] ==
    tsp(sunspot.month)[c(1,3)]) ## Start & Periodicity are the same
n1 <- length(sunspots)
table(eq <- sunspots == sunspot.month[1:n1]) #>  132  are different !
i <- which(!eq)
rug(time(eq)[i])
s1 <- sunspots[i] ; s2 <- sunspot.month[i]
cbind(i = i, time = time(sunspots)[i], sunspots = s1, ss.month = s2,
      perc.diff = round(100*2*abs(s1-s2)/(s1+s2), 1))

## How to recreate the "old" sunspot.month (R <= 3.0.3):
.sunspot.diff <- cbind(
    i = c(1202L, 1256L, 1258L, 1301L, 1407L, 1429L, 1452L, 1455L,
          1663L, 2151L, 2329L, 2498L, 2594L, 2694L, 2819L),
    res10 = c(1L, 1L, 1L, -1L, -1L, -1L, 1L, -1L,
          1L, 1L, 1L, 1L, 1L, 20L, 1L))
ssm0 <- sunspot.month[1:2988]
with(as.data.frame(.sunspot.diff), ssm0[i] <<- ssm0[i] - res10/10)
sunspot.month.0 <- ts(ssm0, start = 1749, frequency = 12)

head(sunspot.month)

## [1] 58.0 62.6 70.0 55.7 85.0 83.5

example(sunspot.month)

## 
## snspt.> require(stats); require(graphics)
## 
## snspt.> ## Compare the monthly series
## snspt.> plot (sunspot.month,
## snspt.+       main="sunspot.month & sunspots [package'datasets']", col=2)

## 
## snspt.> lines(sunspots) # -> faint differences where they overlap
## 
## snspt.> ## Now look at the difference :
## snspt.> all(tsp(sunspots)     [c(1,3)] ==
## snspt.+     tsp(sunspot.month)[c(1,3)]) ## Start & Periodicity are the same
## [1] TRUE
## 
## snspt.> n1 <- length(sunspots)
## 
## snspt.> table(eq <- sunspots == sunspot.month[1:n1]) #>  132  are different !
## 
## FALSE  TRUE 
##   143  2677 
## 
## snspt.> i <- which(!eq)
## 
## snspt.> rug(time(eq)[i])
## 
## snspt.> s1 <- sunspots[i] ; s2 <- sunspot.month[i]
## 
## snspt.> cbind(i = i, time = time(sunspots)[i], sunspots = s1, ss.month = s2,
## snspt.+       perc.diff = round(100*2*abs(s1-s2)/(s1+s2), 1))
##           i     time sunspots ss.month perc.diff
##   [1,]   55 1753.500     22.2     22.0       0.9
##   [2,]  838 1818.750     31.7     31.6       0.3
##   [3,]  841 1819.000     32.5     32.8       0.9
##   [4,]  862 1820.750      9.0      8.9       1.1
##   [5,]  864 1820.917      9.7      9.1       6.4
##   [6,]  866 1821.083      4.3      4.2       2.4
##   [7,]  876 1821.917      0.0      0.2     200.0
##   [8,]  901 1824.000     21.6     21.7       0.5
##   [9,]  917 1825.333     15.4     15.5       0.6
##  [10,]  920 1825.583     25.4     25.7       1.2
##  [11,]  943 1827.500     42.9     42.3       1.4
##  [12,]  946 1827.750     57.2     56.1       1.9
##  [13,]  955 1828.500     54.3     54.2       0.2
##  [14,]  960 1828.917     46.6     46.9       0.6
##  [15,]  965 1829.333     67.5     67.4       0.1
##  [16,]  968 1829.583     78.3     77.6       0.9
##  [17,]  976 1830.250    107.1    106.3       0.7
##  [18,]  988 1831.250     54.6     54.5       0.2
##  [19,]  992 1831.583     54.9     55.0       0.2
##  [20,]  994 1831.750     46.2     46.3       0.2
##  [21,]  998 1832.083     55.5     55.6       0.2
##  [22,] 1003 1832.500     13.9     14.0       0.7
##  [23,] 1047 1836.167     98.1     98.2       0.1
##  [24,] 1061 1837.333    111.3    111.7       0.4
##  [25,] 1081 1839.000    107.6    105.6       1.9
##  [26,] 1087 1839.500     84.7     84.8       0.1
##  [27,] 1090 1839.750     90.8     90.9       0.1
##  [28,] 1092 1839.917     63.6     63.7       0.2
##  [29,] 1095 1840.167     55.5     67.8      20.0
##  [30,] 1102 1840.750     49.8     55.0       9.9
##  [31,] 1105 1841.000     24.0     24.1       0.4
##  [32,] 1108 1841.250     42.6     40.2       5.8
##  [33,] 1109 1841.333     67.4     67.5       0.1
##  [34,] 1113 1841.667     35.1     36.5       3.9
##  [35,] 1124 1842.583     26.5     26.6       0.4
##  [36,] 1125 1842.667     18.5     18.4       0.5
##  [37,] 1132 1843.250      8.8      9.5       7.7
##  [38,] 1145 1844.333     12.0     11.6       3.4
##  [39,] 1149 1844.667      6.9      7.0       1.4
##  [40,] 1156 1845.250     56.9     57.0       0.2
##  [41,] 1168 1846.250     69.2     69.3       0.1
##  [42,] 1185 1847.667    161.2    160.9       0.2
##  [43,] 1191 1848.167    108.9    108.6       0.3
##  [44,] 1194 1848.417    123.8    129.0       4.1
##  [45,] 1196 1848.583    132.5    132.6       0.1
##  [46,] 1200 1848.917    159.9    159.5       0.3
##  [47,] 1201 1849.000    156.7    157.0       0.2
##  [48,] 1202 1849.083    131.7    131.8       0.1
##  [49,] 1203 1849.167     96.5     96.2       0.3
##  [50,] 1206 1849.417     81.2     81.1       0.1
##  [51,] 1208 1849.583     61.3     67.7       9.9
##  [52,] 1211 1849.833     99.7     99.0       0.7
##  [53,] 1224 1850.917     60.0     61.0       1.7
##  [54,] 1235 1851.833     50.9     51.0       0.2
##  [55,] 1238 1852.083     67.5     66.4       1.6
##  [56,] 1243 1852.500     42.0     42.1       0.2
##  [57,] 1256 1853.583     50.4     50.5       0.2
##  [58,] 1258 1853.750     42.3     42.4       0.2
##  [59,] 1264 1854.250     26.4     26.5       0.4
##  [60,] 1270 1854.750     12.7     12.6       0.8
##  [61,] 1272 1854.917     21.4     21.6       0.9
##  [62,] 1282 1855.750      9.7      9.6       1.0
##  [63,] 1283 1855.833      4.3      4.2       2.4
##  [64,] 1290 1856.417      5.0      5.2       3.9
##  [65,] 1301 1857.333     29.2     28.5       2.4
##  [66,] 1333 1860.000     81.5     82.4       1.1
##  [67,] 1334 1860.083     88.0     88.3       0.3
##  [68,] 1346 1861.083     77.8     77.7       0.1
##  [69,] 1350 1861.417     87.8     88.1       0.3
##  [70,] 1366 1862.750     42.0     41.9       0.2
##  [71,] 1407 1866.167     24.6     24.5       0.4
##  [72,] 1424 1867.583      4.9      4.8       2.1
##  [73,] 1427 1867.833      9.3      9.6       3.2
##  [74,] 1429 1868.000     15.6     15.5       0.6
##  [75,] 1430 1868.083     15.8     15.7       0.6
##  [76,] 1435 1868.500     28.6     29.0       1.4
##  [77,] 1437 1868.667     43.8     47.2       7.5
##  [78,] 1438 1868.750     61.7     61.6       0.2
##  [79,] 1442 1869.083     59.3     59.9       1.0
##  [80,] 1445 1869.333    104.0    103.9       0.1
##  [81,] 1450 1869.750     59.4     59.3       0.2
##  [82,] 1451 1869.833     77.4     78.1       0.9
##  [83,] 1452 1869.917    104.3    104.4       0.1
##  [84,] 1455 1870.167    159.4    157.5       1.2
##  [85,] 1472 1871.583    110.0    110.1       0.1
##  [86,] 1476 1871.917     90.3     90.4       0.1
##  [87,] 1486 1872.750    103.5    102.6       0.9
##  [88,] 1497 1873.667     47.5     47.1       0.8
##  [89,] 1498 1873.750     47.4     47.1       0.6
##  [90,] 1514 1875.083     22.2     21.5       3.2
##  [91,] 1527 1876.167     31.2     30.6       1.9
##  [92,] 1539 1877.167     11.7     11.9       1.7
##  [93,] 1541 1877.333     21.2     21.6       1.9
##  [94,] 1542 1877.417     13.4     14.2       5.8
##  [95,] 1543 1877.500      5.9      6.0       1.7
##  [96,] 1545 1877.667     16.4     16.9       3.0
##  [97,] 1547 1877.833     14.5     14.2       2.1
##  [98,] 1548 1877.917      2.3      2.2       4.4
##  [99,] 1550 1878.083      6.0      6.6       9.5
## [100,] 1553 1878.333      5.8      5.9       1.7
## [101,] 1561 1879.000      0.8      1.0      22.2
## [102,] 1571 1879.833     12.9     13.1       1.5
## [103,] 1572 1879.917      7.2      7.3       1.4
## [104,] 1574 1880.083     27.5     27.2       1.1
## [105,] 1575 1880.167     19.5     19.3       1.0
## [106,] 1576 1880.250     19.3     19.5       1.0
## [107,] 1588 1881.250     51.7     51.6       0.2
## [108,] 1592 1881.583     58.0     58.4       0.7
## [109,] 1594 1881.750     64.0     64.4       0.6
## [110,] 1598 1882.083     69.3     69.5       0.3
## [111,] 1599 1882.167     67.5     66.8       1.0
## [112,] 1613 1883.333     32.1     31.5       1.9
## [113,] 1614 1883.417     76.5     76.3       0.3
## [114,] 1623 1884.167     86.8     87.5       0.8
## [115,] 1643 1885.833     33.3     30.9       7.5
## [116,] 1656 1886.917     12.4     13.0       4.7
## [117,] 1663 1887.500     23.3     23.4       0.4
## [118,] 1683 1889.167      7.0      6.7       4.4
## [119,] 1687 1889.500      9.7      9.4       3.1
## [120,] 1712 1891.583     33.2     33.0       0.6
## [121,] 1716 1891.917     32.3     32.5       0.6
## [122,] 1723 1892.500     76.8     76.5       0.4
## [123,] 1734 1893.417     88.2     89.9       1.9
## [124,] 1735 1893.500     88.8     88.6       0.2
## [125,] 1738 1893.750     79.7     80.0       0.4
## [126,] 1774 1896.750     28.4     28.7       1.1
## [127,] 1837 1902.000      5.2      5.5       5.6
## [128,] 2126 1926.083     70.0     69.9       0.1
## [129,] 2151 1928.167     85.4     85.5       0.1
## [130,] 2153 1928.333     76.9     77.0       0.1
## [131,] 2162 1929.083     64.1     62.8       2.0
## [132,] 2174 1930.083     49.2     49.9       1.4
## [133,] 2233 1935.000     18.9     18.6       1.6
## [134,] 2315 1941.833     38.3     38.4       0.3
## [135,] 2329 1943.000     12.4     12.5       0.8
## [136,] 2378 1947.083    113.4    133.4      16.2
## [137,] 2427 1951.167     59.9     55.9       6.9
## [138,] 2498 1957.083    130.2    130.3       0.1
## [139,] 2552 1961.583     55.9     55.8       0.2
## [140,] 2556 1961.917     40.0     39.9       0.3
## [141,] 2594 1965.083     14.2     14.3       0.7
## [142,] 2790 1981.417     90.0     90.9       1.0
## [143,] 2819 1983.833     33.3     33.4       0.3
## 
## snspt.> ## How to recreate the "old" sunspot.month (R <= 3.0.3):
## snspt.> .sunspot.diff <- cbind(
## snspt.+     i = c(1202L, 1256L, 1258L, 1301L, 1407L, 1429L, 1452L, 1455L,
## snspt.+           1663L, 2151L, 2329L, 2498L, 2594L, 2694L, 2819L),
## snspt.+     res10 = c(1L, 1L, 1L, -1L, -1L, -1L, 1L, -1L,
## snspt.+           1L, 1L, 1L, 1L, 1L, 20L, 1L))
## 
## snspt.> ssm0 <- sunspot.month[1:2988]
## 
## snspt.> with(as.data.frame(.sunspot.diff), ssm0[i] <<- ssm0[i] - res10/10)
## 
## snspt.> sunspot.month.0 <- ts(ssm0, start = 1749, frequency = 12)

TOP

sunspot.year Yearly Sunspot Data, 1700-1988

Description
Yearly numbers of sunspots from 1700 to 1988 (rounded to one digit).

Note that monthly numbers are available as sunspot.month, though starting slightly later.

Usage
sunspot.year
Format
The univariate time series sunspot.year contains 289 observations, and is of class "ts".

Source
H. Tong (1996) Non-Linear Time Series. Clarendon Press, Oxford, p. 471.

See Also
For monthly sunspot numbers, see sunspot.month and sunspots.

Regularly updated yearly sunspot numbers are available from WDC-SILSO, Royal Observatory of Belgium, at http://www.sidc.be/silso/datafiles

Examples
utils::str(sm <- sunspots)# the monthly version we keep unchanged
utils::str(sy <- sunspot.year)
## The common time interval
(t1 <- c(max(start(sm), start(sy)),     1)) # Jan 1749
(t2 <- c(min(  end(sm)[1],end(sy)[1]), 12)) # Dec 1983
s.m <- window(sm, start=t1, end=t2)
s.y <- window(sy, start=t1, end=t2[1]) # {irrelevant warning}
stopifnot(length(s.y) * 12 == length(s.m),
          ## The yearly series *is* close to the averages of the monthly one:
          all.equal(s.y, aggregate(s.m, FUN = mean), tol = 0.0020))
## NOTE: Strangely, correctly weighting the number of days per month
##       (using 28.25 for February) is *not* closer than the simple mean:
ndays <- c(31, 28.25, rep(c(31,30, 31,30, 31), 2))
all.equal(s.y, aggregate(s.m, FUN = mean))                     # 0.0013
all.equal(s.y, aggregate(s.m, FUN = weighted.mean, w = ndays)) # 0.0017

head(sunspot.year)

## [1]  5 11 16 23 36 58

example(sunspot.year)

## 
## snspt.> utils::str(sm <- sunspots)# the monthly version we keep unchanged
##  Time-Series [1:2820] from 1749 to 1984: 58 62.6 70 55.7 85 83.5 94.8 66.3 75.9 75.5 ...
## 
## snspt.> utils::str(sy <- sunspot.year)
##  Time-Series [1:289] from 1700 to 1988: 5 11 16 23 36 58 29 20 10 8 ...
## 
## snspt.> ## The common time interval
## snspt.> (t1 <- c(max(start(sm), start(sy)),     1)) # Jan 1749
## [1] 1749    1
## 
## snspt.> (t2 <- c(min(  end(sm)[1],end(sy)[1]), 12)) # Dec 1983
## [1] 1983   12
## 
## snspt.> s.m <- window(sm, start=t1, end=t2)
## 
## snspt.> s.y <- window(sy, start=t1, end=t2[1]) # {irrelevant warning}
## 
## snspt.> stopifnot(length(s.y) * 12 == length(s.m),
## snspt.+           ## The yearly series *is* close to the averages of the monthly one:
## snspt.+           all.equal(s.y, aggregate(s.m, FUN = mean), tol = 0.0020))
## 
## snspt.> ## NOTE: Strangely, correctly weighting the number of days per month
## snspt.> ##       (using 28.25 for February) is *not* closer than the simple mean:
## snspt.> ndays <- c(31, 28.25, rep(c(31,30, 31,30, 31), 2))
## 
## snspt.> all.equal(s.y, aggregate(s.m, FUN = mean))                     # 0.0013
## [1] "Mean relative difference: 0.001312539"
## 
## snspt.> all.equal(s.y, aggregate(s.m, FUN = weighted.mean, w = ndays)) # 0.0017
## [1] "Mean relative difference: 0.001692215"

TOP

sunspots Monthly Sunspot Numbers, 1749-1983

Description
Monthly mean relative sunspot numbers from 1749 to 1983. Collected at Swiss Federal Observatory, Zurich until 1960, then Tokyo Astronomical Observatory.

Usage
sunspots
Format
A time series of monthly data from 1749 to 1983.

Source
Andrews, D. F. and Herzberg, A. M. (1985) Data: A Collection of Problems from Many Fields for the Student and Research Worker. New York: Springer-Verlag.

See Also
sunspot.month has a longer (and a bit different) series, sunspot.year is a much shorter one. See there for getting more current sunspot numbers.

Examples
require(graphics)
plot(sunspots, main = "sunspots data", xlab = "Year",
     ylab = "Monthly sunspot numbers")

head(sunspots)

## [1] 58.0 62.6 70.0 55.7 85.0 83.5

example(sunspots)

## 
## snspts> require(graphics)
## 
## snspts> plot(sunspots, main = "sunspots data", xlab = "Year",
## snspts+      ylab = "Monthly sunspot numbers")

TOP

swiss Swiss Fertility and Socioeconomic Indicators (1888) Data

Description
Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.

Usage
swiss
Format
A data frame with 47 observations on 6 variables, each of which is in percent, i.e., in [0, 100].

[,1]    Fertility   Ig, ‘common standardized fertility measure’
[,2]    Agriculture % of males involved in agriculture as occupation
[,3]    Examination % draftees receiving highest mark on army examination
[,4]    Education   % education beyond primary school for draftees.
[,5]    Catholic    % ‘catholic’ (as opposed to ‘protestant’).
[,6]    Infant.Mortality    live births who live less than 1 year.
All variables but ‘Fertility’ give proportions of the population.

Details
(paraphrasing Mosteller and Tukey):

Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries.

The data collected are for 47 French-speaking “provinces” at about 1888.

Here, all variables are scaled to [0, 100], where in the original, all but "Catholic" were scaled to [0, 1].

Note
Files for all 182 districts in 1888 and other years have been available at https://opr.princeton.edu/archive/pefp/switz.aspx.

They state that variables Examination and Education are averages for 1887, 1888 and 1889.

Source
Project “16P5”, pages 549–551 in

Mosteller, F. and Tukey, J. W. (1977) Data Analysis and Regression: A Second Course in Statistics. Addison-Wesley, Reading Mass.

indicating their source as “Data used by permission of Franice van de Walle. Office of Population Research, Princeton University, 1976. Unpublished data assembled under NICHD contract number No 1-HD-O-2077.”

References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Examples
require(stats); require(graphics)
pairs(swiss, panel = panel.smooth, main = "swiss data",
      col = 3 + (swiss$Catholic > 50))
summary(lm(Fertility ~ . , data = swiss))

head(swiss)

example(swiss)

## 
## swiss> require(stats); require(graphics)
## 
## swiss> pairs(swiss, panel = panel.smooth, main = "swiss data",
## swiss+       col = 3 + (swiss$Catholic > 50))

## 
## swiss> summary(lm(Fertility ~ . , data = swiss))
## 
## Call:
## lm(formula = Fertility ~ ., data = swiss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.2743  -5.2617   0.5032   4.1198  15.3213 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      66.91518   10.70604   6.250 1.91e-07 ***
## Agriculture      -0.17211    0.07030  -2.448  0.01873 *  
## Examination      -0.25801    0.25388  -1.016  0.31546    
## Education        -0.87094    0.18303  -4.758 2.43e-05 ***
## Catholic          0.10412    0.03526   2.953  0.00519 ** 
## Infant.Mortality  1.07705    0.38172   2.822  0.00734 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.165 on 41 degrees of freedom
## Multiple R-squared:  0.7067, Adjusted R-squared:  0.671 
## F-statistic: 19.76 on 5 and 41 DF,  p-value: 5.594e-10

TOP

treering Yearly Treering Data, -6000-1979

Description
Contains normalized tree-ring widths in dimensionless units.

Usage
treering
Format
A univariate time series with 7981 observations. The object is of class "ts".

Each tree ring corresponds to one year.

Details
The data were recorded by Donald A. Graybill, 1980, from Gt Basin Bristlecone Pine 2805M, 3726-11810 in Methuselah Walk, California.

Source
Time Series Data Library: http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/, series ‘CA535.DAT’

References
For some photos of Methuselah Walk see https://web.archive.org/web/20110523225828/http://www.ltrr.arizona.edu/~hallman/sitephotos/meth.html

head(treering)

## [1] 1.345 1.077 1.545 1.319 1.413 1.069

TOP

trees Diameter, Height and Volume for Black Cherry Trees

Description
This data set provides measurements of the diameter, height and volume of timber in 31 felled black cherry trees. Note that the diameter (in inches) is erroneously labelled Girth in the data. It is measured at 4 ft 6 in above the ground.

Usage
trees
Format
A data frame with 31 observations on 3 variables.

[,1]    Girth   numeric Tree diameter (rather than girth, actually) in inches
[,2]    Height  numeric Height in ft
[,3]    Volume  numeric Volume of timber in cubic ft
Source
Ryan, T. A., Joiner, B. L. and Ryan, B. F. (1976) The Minitab Student Handbook. Duxbury Press.

References
Atkinson, A. C. (1985) Plots, Transformations and Regression. Oxford University Press.

Examples
require(stats); require(graphics)
pairs(trees, panel = panel.smooth, main = "trees data")
plot(Volume ~ Girth, data = trees, log = "xy")
coplot(log(Volume) ~ log(Girth) | Height, data = trees,
       panel = panel.smooth)
summary(fm1 <- lm(log(Volume) ~ log(Girth), data = trees))
summary(fm2 <- update(fm1, ~ . + log(Height), data = trees))
step(fm2)
## i.e., Volume ~= c * Height * Girth^2  seems reasonable

head(trees)

example(trees)

## 
## trees> require(stats); require(graphics)
## 
## trees> pairs(trees, panel = panel.smooth, main = "trees data")

## 
## trees> plot(Volume ~ Girth, data = trees, log = "xy")

## 
## trees> coplot(log(Volume) ~ log(Girth) | Height, data = trees,
## trees+        panel = panel.smooth)

## 
## trees> summary(fm1 <- lm(log(Volume) ~ log(Girth), data = trees))
## 
## Call:
## lm(formula = log(Volume) ~ log(Girth), data = trees)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.205999 -0.068702  0.001011  0.072585  0.247963 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.35332    0.23066  -10.20 4.18e-11 ***
## log(Girth)   2.19997    0.08983   24.49  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.115 on 29 degrees of freedom
## Multiple R-squared:  0.9539, Adjusted R-squared:  0.9523 
## F-statistic: 599.7 on 1 and 29 DF,  p-value: < 2.2e-16
## 
## 
## trees> summary(fm2 <- update(fm1, ~ . + log(Height), data = trees))
## 
## Call:
## lm(formula = log(Volume) ~ log(Girth) + log(Height), data = trees)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.168561 -0.048488  0.002431  0.063637  0.129223 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.63162    0.79979  -8.292 5.06e-09 ***
## log(Girth)   1.98265    0.07501  26.432  < 2e-16 ***
## log(Height)  1.11712    0.20444   5.464 7.81e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08139 on 28 degrees of freedom
## Multiple R-squared:  0.9777, Adjusted R-squared:  0.9761 
## F-statistic: 613.2 on 2 and 28 DF,  p-value: < 2.2e-16
## 
## 
## trees> step(fm2)
## Start:  AIC=-152.69
## log(Volume) ~ log(Girth) + log(Height)
## 
##               Df Sum of Sq    RSS      AIC
## <none>                     0.1855 -152.685
## - log(Height)  1    0.1978 0.3832 -132.185
## - log(Girth)   1    4.6275 4.8130  -53.743
## 
## Call:
## lm(formula = log(Volume) ~ log(Girth) + log(Height), data = trees)
## 
## Coefficients:
## (Intercept)   log(Girth)  log(Height)  
##      -6.632        1.983        1.117  
## 
## 
## trees> ## i.e., Volume ~= c * Height * Girth^2  seems reasonable
## trees> 
## trees> 
## trees>

TOP

uspop Populations Recorded by the US Census

Description
This data set gives the population of the United States (in millions) as recorded by the decennial census for the period 1790–1970.

Usage
uspop
Format
A time series of 19 values.

Source
McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.

Examples
require(graphics)
plot(uspop, log = "y", main = "uspop data", xlab = "Year",
     ylab = "U.S. Population (millions)")

head(uspop)

## [1]  3.93  5.31  7.24  9.64 12.90 17.10

example(uspop)

## 
## uspop> require(graphics)
## 
## uspop> plot(uspop, log = "y", main = "uspop data", xlab = "Year",
## uspop+      ylab = "U.S. Population (millions)")

TOP

volcano Topographic Information on Auckland’s Maunga Whau Volcano

Description
Maunga Whau (Mt Eden) is one of about 50 volcanos in the Auckland volcanic field. This data set gives topographic information for Maunga Whau on a 10m by 10m grid.

Usage
volcano
Format
A matrix with 87 rows and 61 columns, rows corresponding to grid lines running east to west and columns to grid lines running south to north.

Source
Digitized from a topographic map by Ross Ihaka. These data should not be regarded as accurate.

See Also
filled.contour for a nice plot.

Examples
require(grDevices); require(graphics)
filled.contour(volcano, color.palette = terrain.colors, asp = 1)
title(main = "volcano data: filled contour map")

head(volcano)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]  100  100  101  101  101  101  101  100  100   100   101   101   102   102
## [2,]  101  101  102  102  102  102  102  101  101   101   102   102   103   103
## [3,]  102  102  103  103  103  103  103  102  102   102   103   103   104   104
## [4,]  103  103  104  104  104  104  104  103  103   103   103   104   104   104
## [5,]  104  104  105  105  105  105  105  104  104   103   104   104   105   105
## [6,]  105  105  105  106  106  106  106  105  105   104   104   105   105   106
##      [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
## [1,]   102   102   103   104   103   102   101   101   102   103   104   104
## [2,]   103   103   104   105   104   103   102   102   103   105   106   106
## [3,]   104   104   105   106   105   104   104   105   106   107   108   110
## [4,]   105   105   106   107   106   106   106   107   108   110   111   114
## [5,]   105   106   107   108   108   108   109   110   112   114   115   118
## [6,]   106   107   109   110   110   112   113   115   116   118   119   121
##      [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
## [1,]   105   107   107   107   108   108   110   110   110   110   110   110
## [2,]   107   109   110   110   110   110   111   112   113   114   116   115
## [3,]   111   113   114   115   114   115   116   118   119   119   121   121
## [4,]   117   118   117   119   120   121   122   124   125   126   127   127
## [5,]   121   122   121   123   128   131   129   130   131   131   132   132
## [6,]   124   126   126   129   134   137   137   136   136   135   136   136
##      [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
## [1,]   110   110   108   108   108   107   107   108   108   108   108   108
## [2,]   114   112   110   110   110   109   108   109   109   109   109   108
## [3,]   120   118   116   114   112   111   110   110   110   110   109   109
## [4,]   126   124   122   120   117   116   113   111   110   110   110   109
## [5,]   131   130   128   126   122   119   115   114   112   110   110   110
## [6,]   136   135   133   129   126   122   118   116   115   113   111   110
##      [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61]
## [1,]   107   107   107   107   106   106   105   105   104   104   103
## [2,]   108   108   108   107   107   106   106   105   105   104   104
## [3,]   109   109   108   108   107   107   106   106   105   105   104
## [4,]   109   109   109   108   108   107   107   106   106   105   105
## [5,]   110   110   109   109   108   107   107   107   106   106   105
## [6,]   110   110   110   109   108   108   108   107   107   106   106

example(volcano)

## 
## volcan> require(grDevices); require(graphics)
## 
## volcan> filled.contour(volcano, color.palette = terrain.colors, asp = 1)

## 
## volcan> title(main = "volcano data: filled contour map")

TOP

warpbreaks The Number of Breaks in Yarn during Weaving

Description
This data set gives the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn.

Usage
warpbreaks
Format
A data frame with 54 observations on 3 variables.

[,1]    breaks  numeric The number of breaks
[,2]    wool    factor  The type of wool (A or B)
[,3]    tension factor  The level of tension (L, M, H)
There are measurements on 9 looms for each of the six types of warp (AL, AM, AH, BL, BM, BH).

Source
Tippett, L. H. C. (1950) Technological Applications of Statistics. Wiley. Page 106.

References
Tukey, J. W. (1977) Exploratory Data Analysis. Addison-Wesley.

McNeil, D. R. (1977) Interactive Data Analysis. Wiley.

See Also
xtabs for ways to display these data as a table.

Examples
require(stats); require(graphics)
summary(warpbreaks)
opar <- par(mfrow = c(1, 2), oma = c(0, 0, 1.1, 0))
plot(breaks ~ tension, data = warpbreaks, col = "lightgray",
     varwidth = TRUE, subset = wool == "A", main = "Wool A")
plot(breaks ~ tension, data = warpbreaks, col = "lightgray",
     varwidth = TRUE, subset = wool == "B", main = "Wool B")
mtext("warpbreaks data", side = 3, outer = TRUE)
par(opar)
summary(fm1 <- lm(breaks ~ wool*tension, data = warpbreaks))
anova(fm1)

head(warpbreaks)

example(warpbreaks)

## 
## wrpbrk> require(stats); require(graphics)
## 
## wrpbrk> summary(warpbreaks)
##      breaks      wool   tension
##  Min.   :10.00   A:27   L:18   
##  1st Qu.:18.25   B:27   M:18   
##  Median :26.00          H:18   
##  Mean   :28.15                 
##  3rd Qu.:34.00                 
##  Max.   :70.00                 
## 
## wrpbrk> opar <- par(mfrow = c(1, 2), oma = c(0, 0, 1.1, 0))
## 
## wrpbrk> plot(breaks ~ tension, data = warpbreaks, col = "lightgray",
## wrpbrk+      varwidth = TRUE, subset = wool == "A", main = "Wool A")

## 
## wrpbrk> plot(breaks ~ tension, data = warpbreaks, col = "lightgray",
## wrpbrk+      varwidth = TRUE, subset = wool == "B", main = "Wool B")

## 
## wrpbrk> mtext("warpbreaks data", side = 3, outer = TRUE)
## 
## wrpbrk> par(opar)
## 
## wrpbrk> summary(fm1 <- lm(breaks ~ wool*tension, data = warpbreaks))
## 
## Call:
## lm(formula = breaks ~ wool * tension, data = warpbreaks)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.5556  -6.8889  -0.6667   7.1944  25.4444 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      44.556      3.647  12.218 2.43e-16 ***
## woolB           -16.333      5.157  -3.167 0.002677 ** 
## tensionM        -20.556      5.157  -3.986 0.000228 ***
## tensionH        -20.000      5.157  -3.878 0.000320 ***
## woolB:tensionM   21.111      7.294   2.895 0.005698 ** 
## woolB:tensionH   10.556      7.294   1.447 0.154327    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.94 on 48 degrees of freedom
## Multiple R-squared:  0.3778, Adjusted R-squared:  0.3129 
## F-statistic: 5.828 on 5 and 48 DF,  p-value: 0.0002772
## 
## 
## wrpbrk> anova(fm1)
## Analysis of Variance Table
## 
## Response: breaks
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## wool          1  450.7  450.67  3.7653 0.0582130 .  
## tension       2 2034.3 1017.13  8.4980 0.0006926 ***
## wool:tension  2 1002.8  501.39  4.1891 0.0210442 *  
## Residuals    48 5745.1  119.69                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TOP

women Average Heights and Weights for American Women

Description
This data set gives the average heights and weights for American women aged 30–39.

Usage
women
Format
A data frame with 15 observations on 2 variables.

[,1]    height  numeric Height (in)
[,2]    weight  numeric Weight (lbs)
Details
The data set appears to have been taken from the American Society of Actuaries Build and Blood Pressure Study for some (unknown to us) earlier year.

The World Almanac notes: “The figures represent weights in ordinary indoor clothing and shoes, and heights with shoes”.

Source
The World Almanac and Book of Facts, 1975.

References
McNeil, D. R. (1977) Interactive Data Analysis. Wiley.

Examples
require(graphics)
plot(women, xlab = "Height (in)", ylab = "Weight (lb)",
     main = "women data: American women aged 30-39")

head(women)

example(women)

## 
## women> require(graphics)
## 
## women> plot(women, xlab = "Height (in)", ylab = "Weight (lb)",
## women+      main = "women data: American women aged 30-39")