This document is interactive! You don’t need to start Rstudio to run the commands, intead, you just need to enter your commands in interactive displays such as *this one just below*. Try any command (e.g. `1+1`

), press **Run code** and see what happens.

In this documents, we give you the solution: click on the button “Solution” to reveal (a proposition of) solution. In the next interactive display, compute the square of 2 (\(2^2\)) and pu the result in an object called `x`

.

`x <- 2^2`

Sometimes, we also give you a hint, and then the solution: click on the button “Hints” to reveal the hint, and on the button “Next hint” to reveal the solution. In the next interactive display, compute the square-root of 2 (\(\sqrt{2}\)) and put the result in a object called `y`

.

`?sqrt`

`y <- sqrt(2)`

*NB: anytime, you can clear your work using by clicking “Start Over” (in the left panel, below section titles).*

We want to import the file *BirthWeight.txt* (in the folder data) which is tab delimited:

`dat1 <- read.table("BirthWeight.txt", header = T)`

We then check whether it was correctly imported using the function *head()* (NB: you might also want to use the function *View()* in a more interactive fashion).

**Tip 1:** *read.table()* may handle various field delimiters such as “;” or “,”. They may be specified as follow: read.table(“my_file.txt”, sep = “,”)

**Tip 2:** if your files are genuine .csv files, you may import them using *read.csv()* [when fields are delimited by “,”] or *read.csv2()* [when fields are delimited by “;”]

**Tip 3:** you may also directly import .xlsx (or .xls) files by using *ad hoc* functions such as *read.xls()* in the package *gdata*

What is the class of `dat1`

?

`dat1`

`?class`

`class(dat1)`

What is the structure of `dat1`

?

`dat1`

`?str`

`str(dat1)`

What are the variable names (i.e. the columns) in `dat1`

?

`dat1`

`?colnames`

`colnames(dat1)`

Call the variable *bw* with the `$`

syntax.

`dat1`

`dat1$bw`

Call the variable *bw* with the `["NAME_OF_THE_VARIABLE"]`

syntax.

`dat1`

`dat1[, "bw"]`

Select values in *bw* higher or equal to 2000, and put the result in an object called `sel_dat1`

.

`dat1`

`sel_dat1 <- dat1$bw[dat1$bw >= 2000]`

Finally, subset the data frame such that it contains only values of \(bw >= 2000\) and \(bpd >= 90\) and exclude the 4th column (ID number), and put the result in an object called `sub_dat1`

.

`dat1`

`sub_dat1 <- dat1[dat1$bw >= 2000 & dat1$bpd >= 90, c("bw", "bpd", "ad")]`

Calculate the quantile of order 0.975 from a Gaussian distribution (mean = 0, standard-deviation = 1)

`0.975`

`?qnorm`

`qnorm(0.975)`

Now, could you calculate quantile of order 0.975 from a Gaussian distribution of mean = 3 and standard-deviation = 5)?

`0.975`

`?qnorm`

`qnorm(0.975, mean = 3, sd = 5)`

Make a histogram of the distribution of *bpd* (in dataset *dat1*).

`dat1`

`?hist`

`hist(dat1$bpd)`

Compute the average, the median, the standard deviation as well as the 25% and 75% empirical quartiles for the distribution of bpd:

`dat1$bpd`

```
# average
mean(dat1$bpd)
# median
median(dat1$bpd)
# standard deviation
sd(dat1$bpd)
# average
quantile(dat1$bpd, c(0.25, 0.75))
```

Transform the *ad* variable, wich is continuous, into two classes: small, \(<= 100\) and large \(> 100\) and create a factor object called `ad_categories`

.

`dat1$ad`

`?cut`

`ad_categories <- cut(dat1$ad, c(-Inf, 100, Inf), labels = c("small", "large"))`

`ad_categories <- cut(dat1$ad, c(-Inf, 100, Inf), labels = c("small", "large"))`

Add the factor `ad_categories`

to the existing data frame, as a new column (also) called `ad_categories`

.

`ad_categories`

`"Remember the syntax df$new_column <- new_object"`

`dat1$ad_categories <- factor(ad_categories)`

Finally, make a boxplot of the *ad* as a function of the latter classes (i.e., small and large).

```
ad_categories <- cut(dat1$ad, c(-Inf, 100, Inf), labels = c("small", "large"))
dat1$ad_categories <- ad_categories
```

`dat1$bw`

`?boxplot`

`boxplot(bw ~ ad_categories, data = dat1)`

Optional (1): you might want to make a fancier histogram (of the distribution of bpd) using the ggplot2 package (!) Tweak the code below to make it fit our dataset and objects:

```
ggplot(data = your_data_frame, aes(x = your_variable_of_interest)) + # creates a ggplot object
geom_histogram(binwidth = 5, fill = "steelblue", col = "steelblue4") + # add the histogram
ggtitle("Distribution of bpd values") # add a title
```

```
ggplot(data = dat1, aes(x = bpd)) +
geom_histogram(binwidth = 5, fill = "steelblue", col = "steelblue4") +
ggtitle("Distribution of bpd values")
```

Optional (2): you might want to make a fancier boxplot (of bw as a function of the classes “small” and “large” ad) using the ggplot2 package (!):

```
ad_categories <- cut(dat1$ad, c(-Inf, 100, Inf), labels = c("small", "large"))
dat1$ad_categories <- ad_categories
```

```
ggplot(data = your_data_frame, aes(x = your_categories, y = your_variable_of_interest, fill = your_categories)) + # creates a ggplot object
geom_boxplot(outlier.shape = NA) + # creates boxplots
geom_jitter(height = 0, width = 0.1) + # add dots on the top of it
ggtitle("Distribution of Birth weight as a function of classes of abdominal diameter") + # add a tittle
xlab("Abdominal diameter") + # add a x-axis label
ylab("Weight at birth") + # add a y-axis label
theme_classic() # use a simple background
```

```
ggplot(data = dat1, aes(x = ad_categories, y = bw, fill = ad_categories)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(height = 0, width = 0.1) +
ggtitle("Distribution of Birth weight as a function of classes of abdominal diameter") +
xlab("Abdominal diameter") +
ylab("Weight at birth") +
theme_classic()
```