Chapter 2 Basic Data Types and Structures

2.1 Data types

There are different kinds of values in R that can be manipulated in variables in R. class()function is used to check the data type of a value or a variable. Different data types include;

  • Numeric

These represent numeric values such as integers and decimals. They are used for mathematical expressions and quantitative data analysis. The below code finds the data type of variable a which is assigned 23.5 and returns numeric.

a=23.5
class(a) #check the data type of a 
## [1] "numeric"

a whole number without without a decimal is also numeric for instance 45, 8, 0 and 73. Run the code chunks below to inspect to find the code of each value

class(45)
## [1] "numeric"
class(8)
## [1] "numeric"
class(0)
## [1] "numeric"
class(73)
## [1] "numeric"

Practical Exercise

Answer the questions below;

  1. Find the data type of 98.03 using class() function.
  2. Assign the value 98.03 to variable height and find data type of height.
  • Integers

They represent whole numbers without any any decimals and are a subclass of numeric. L is added at the end of a whole number to indicate that it is an integer.

a=23L #add L to show it is an integer
class(a)
## [1] "integer"

Lets store age as an integer. Note the ‘L’ after the number 27

age = 27L
class(age)
## [1] "integer"

Practical Exercise

Answer the questions below;

  1. Find the data type of any whole number using class() function. Remember to add L after the digits
  2. There are 27 goats in a field, assign the quantity of goats to a variable goats and find the data type of the variable goats.
  • Characters

They represent text strings such as names, sentences and labels. They are enclosed in ” or ’.

a="DNA"
class(a)
## [1] "character"

Lets use name as a character

name = "Pragya"
class(name)
## [1] "character"

for an object

item = "car" # "car" is stored in a variable item
class(item)
## [1] "character"

Character data types can have empty spaces in between, for instance;

fullname = "Salman Khan"
class(fullname)
## [1] "character"

Practical Exercise

In the code cell below;

  1. Find the data type of the value "school" using the class() function.
  2. Assign your first name to a variable firstname and find its data type. Remember to enclose it in quotation marks
  3. Assign your full names to a variable full_name and find its data type. For instance if your name is “Vipin Patel” assign it like;full_name = "Vipin Patel" and find its data type. Remember to enclose the value in quotation marks since its a character data type
  • Logical

They represent boolean values which has only distinct value; TRUE or FALSE.

a=TRUE #logical data types is either TRUE or FALSE only
class(a)
## [1] "logical"

changing it to FALSE

b = FALSE
class(b)
## [1] "logical"

Practical Exercise

Assign a TRUE to a variable grateful and find the data type of the variable.

  • Complex

They represent complex numbers with real and imaginary parts

a=2+3i # Complex data types have 'i' at the end of each number
class(a)
## [1] "complex"

2 is the real part while 3i is the imaginary part. Also, complex numbers can be created by complex() function with real and imaginary as the arguments.

z = complex(real = 3, imaginary = 7)
print(z) #show the comlex value
## [1] 3+7i
class(z) #confirm that it is a complex number 
## [1] "complex"

Lets try another values to fit to the complex data type

  1. 2+5i
z = complex(real=2, imaginary = 5)
print(z)
## [1] 2+5i
class(z)
## [1] "complex"
  1. 7 + 6i
m=complex(real=7, imaginary = 6)
print(m)
## [1] 7+6i
class(m)
## [1] "complex"
  1. 4i - 1
b = 4i-1
print(b)
## [1] -1+4i
class(b)
## [1] "complex"

Complex data types can include the imaginary part only without real number, R will assume the real part to be 0(zero). For instance;

h = 3i
print(h)
## [1] 0+3i
class(h)
## [1] "complex"

Practical Exercise

Find the data type of the following values; One of them is a numeric element

  1. 3i + 8
  2. 5 - 1i
  3. 4i
  4. 12
  • Raw

They represent a vector of bytes in their natural form. They are used in storing binary data. Example;

a=charToRaw("DNA")
print(a)
## [1] 44 4e 41
class(a)
## [1] "raw"
# convert back to character 
b=rawToChar(a)
class(b)
## [1] "character"

“Hello world” can be represented as in the results below when converted to raw data type

binary_data = charToRaw("Hello World")
print(binary_data) 
##  [1] 48 65 6c 6c 6f 20 57 6f 72 6c 64
class(binary_data)
## [1] "raw"

Numeric can also be represented as raw vectors;

age=as.raw(27)
print(age)
## [1] 1b
class(age)
## [1] "raw"

Practical Exercise

Convert the following values to raw data types; Hint: use charToRaw() function for character data types and as.raw() to other data types.

  1. "Vipin"
  2. 27
  3. 69.0
  4. FALSE
  5. 12L

2.2 Data Structures

This is the organization of data into one or multiple data values in specific structures. Different types of data structures in R include;

  • Vector

  • Matrix

  • Data frame

2.2.1 Vector

A vector is a single entity consisting of a collection of things. They are versatile providing a basis of many operations in statistics and data manipulation hence it is important to have knowledge of vectors for effective programming in R. Vectors are created using a c() function, here is an example of a vector.

marks = c(23, 67, 98, 34, 98, 21)
print(marks) # print to the console
## [1] 23 67 98 34 98 21

Practical Exercise

Create a vector named ages and insert the following values 21, 32, 22, 24, 27, 54, 20, 13 and print it out on the console

The class function is utilized to determine the data types present within vector data values.

marks = c(23, 67, 98, 34, 98, 21)
class(marks)
## [1] "numeric"

The vector “marks” consist of only numeric values

is.vector function is used to check if the variable is a vector. It will return a Boolean value, TRUE if the variable in question is truly a vector while FALSE if otherwise.

marks = c(23, 67, 98, 34, 98, 21)
is.vector(marks)
## [1] TRUE

unlike matrix and data frame, vector has no dimension

marks = c(23, 67, 98, 34, 98, 21)
dim(marks)
## NULL

length() function is used to count number of elements in vectors. In our case vector marks, marks = c(23, 67, 98, 34, 98, 21) has six elements, therefore, length() command will return 6.

marks = c(23, 67, 98, 34, 98, 21)
length(marks)
## [1] 6

Practical Exercise

Create a vector named height with its elements/values as 120.1, 118, 123.4, 130.8, 115.2 and do the following;

  1. print it out to the console using print() function.
  2. find the data type of its elements using class() function
  3. use is.vector() function to find if its really a vector
  4. count the number of elements in the vector using length() function.

Index is the position of an element in a vector, in R it starts at index 1 - lets say we find the third element by index 3

marks = c(23, 67, 98, 34, 98, 21)
marks[3]
## [1] 98

value “98” is at index 3, or the third in the vector. The first value/element of a vector is indexed 1, for instance if we find the first value in the vector marks.

marks = c(23, 67, 98, 34, 98, 21)
marks[1] #returns the first value
## [1] 23

The sequence goes on, the second, third, fourth, fifth … values are indexed as , 2, 3, 4, 5… respectively. i.e the n^th value is indexed as n.

Vectors can also be sliced to obtain values over a range of indices. For instance the code below shows how to retrieve the from the second to the fourth values as a vector

marks = c(23, 67, 98, 34, 98, 21)
print(marks[2:4])
## [1] 67 98 34
is.vector(marks[2:4]) # confirm if the retrieved values are in a vector
## [1] TRUE

An element at a specific index in a vector can be excluded by adding a - sign before the index value.

marks = c(23, 67, 98, 34, 98, 21)
marks[-2] #exclude the element at index 2
## [1] 23 98 34 98 21

rev() command is used to reverse the order of elements in a vector

marks = c(23, 67, 98, 34, 98, 21)
rev(marks)
## [1] 21 98 34 98 67 23

Practical Exercise

Create a vector named ages and insert the following values; 13, 59, 27, 22, 19, 31, 43. Use it to answer the questions below.

  1. Print out the vector ages to the console
  2. Store the third element in a variable called my_age and print it out.
  3. Extract the values from the second to the fifth element and print them out.
  4. Exclude the third element
  5. Reverse the order of the elements in the vector.

2.2.1.1 Mathematical Operations in a vector

The summary/descriptive statistics are calculated by summary() command.

marks = c(23, 67, 98, 34, 98, 21)
summary(marks)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   21.00   25.75   50.50   56.83   90.25   98.00

sum(), median(), and mean() are used to calculate the total, median, average and the standard deviation of the values in a vector

marks = c(23, 67, 98, 34, 98, 21)
print("MARKS")
## [1] "MARKS"
print(paste("TOTAL: ", sum(marks)))
## [1] "TOTAL:  341"
print(paste("MEDIAN: ", median(marks)))
## [1] "MEDIAN:  50.5"
print(paste("AVERAGE: ", mean(marks)))
## [1] "AVERAGE:  56.8333333333333"
  • Vector multiplication and division - vectors can be multiplied or divided by a scalar value of another vector of the same length and numeric data type. For instance, the vector marks=c(23, 67, 98, 34, 98, 21) is being multiplied by a scalar value 2, that will multiply each element in a vector by two.
marks = c(23, 67, 98, 34, 98, 21)

# Multiply each element in the vector by 2
double_marks =2 * marks 

marks
## [1] 23 67 98 34 98 21
double_marks
## [1]  46 134 196  68 196  42

The values in the vector marks can also be scaled down to a half when multiplied by a scalar value 0.5.

marks = c(23, 67, 98, 34, 98, 21)

# Multiply by 0.5 to scale the marks by a half
half_marks =0.5 * marks 

marks
## [1] 23 67 98 34 98 21
half_marks
## [1] 11.5 33.5 49.0 17.0 49.0 10.5

Alternatively, instead of multiplying the vector by 0.5, it can be divided by 2 a scalar value two. This is what is referred to as vector division.

marks = c(23, 67, 98, 34, 98, 21)

# Scale down the marks by a half by dividing by 2 instead of multiplying by 0.5
half_marks = marks/2
half_marks
## [1] 11.5 33.5 49.0 17.0 49.0 10.5

Practical Exercise

Create a vector with the following values; 67, 55, 60, 59, 57.2, 71, 62, 66, 70 and name the vector weights. Use the variable weights to solve the following problems

  1. Calculate the; i. median weight ii. mean(average) weight iii. the total weight when summed together
  2. Calculate the summary statistics using the summary() function.
  3. Add 10 to variable weights and the answer added_weights.
  4. Subtract 15 to weights and name it reduced_weights.
  5. Scale the weights by multiplying the vector by 1.5. ’
  6. Scale down the weights to a third by dividing the vector by 3.

Vector by vector multiplication and division

Two or more vectors of numeric values of the equal length can be multiplied or divided by each other. The example below demonstrates vector by vector multiplication of vector a; 3, 5, 1 and vector b: 7, 3, 9. Each value is multiplied by a value of a corresponding index in the next vector such that;

  • 3 is multiplied by 7 to be 21

  • 5 is multiplied by 3 to be 15

  • 1 is multiplied by 9 to be 9.

The resultant vector is now 21 15 9.

a = c(3, 5, 1)
b = c(7, 3, 9)
ab = a*b
ab
## [1] 21 15  9
ba = b*a # is the same as ab
ba
## [1] 21 15  9

The same vectors can also be divided by each other provided they are of the same length and all have numeric values. The order of vector division, for instance in the first case vector a is divided by vector b such that;

  • 3 is divided by 7 to be 0.4285714

  • 5 is divided by 3 to be 1.6666667

  • 1 is divided by 9 to be 0.1111111.

The resultant vector is now 0.4285714 1.6666667 0.1111111.

# First case
a = c(3, 5, 1)
b = c(7, 3, 9)

# Divide vector a by b
abdiv=a/b
abdiv
## [1] 0.4285714 1.6666667 0.1111111

, and in the second case the order of vector division is reversed by vector b being divided by a (b/a instead of a/b) such that;

  • 7 is divided by 3 to be 2.333333

  • 3 is divided by 5 to be 0.600000

  • 9 is divided by 1 to be 9.000000.

The resultant vector is now 2.333333 0.600000 9.000000.

# Second case
a = c(3, 5, 1)
b = c(7, 3, 9)
# Divide vector b by a
badiv=b/a
badiv
## [1] 2.333333 0.600000 9.000000

However, when multiplying vectors of unequal length the shorter one is replicated to match the longer vector. It will then return a warning. The case below shows how vector e=c(1,2,3,4,5) and f=c(1,2) are multiplied.

  • vector f=c(1,2) will be replicated to match the length of vector e, therefore, vector f will be f=c(1,2,1,2,1). The process of vector by vector multiplication will be followed.
e=c(1,2,3,4,5)
f=c(1,2)
ef = e*f #it shows an error
## Warning in e * f: longer object length is not a multiple of shorter object
## length
ef #shows results since f is replicated to match e as f=c(1,2,1,2,1)
## [1] 1 4 3 8 5

Multiple vectors can be concatenated/combined to come up with one giant vector

a
## [1] 3 5 1
b
## [1] 7 3 9
z=c(a,b,a) #concatenates the vectors 
z
## [1] 3 5 1 7 3 9 3 5 1

Practical Exercise

Create two vectors, vector1;4, 6, 12, 7 and vector2:7, 3, 5, 10. Use the two vectors to solve the following questions.

  1. Create vector3 by multiplying vector1 and vector2. Print it out.
  2. Create vector4a by diving vector1 by vector2. Print it out.
  3. Create vector4b by dividing vector2 by vector1. Print it out.
  4. Is there a difference between vector4a and vector4b? If there is, what brought the difference? Write the answer as a comment.
  5. Create another vector5; 4, 6 and multiply it with vector1 to come up with vector6. Print it out.
  6. Concatenate vector1, vector2 and vector5 to come up with a giant_vector. Print it out.

2.2.1.2 Character Vectors

Vectors can also contain character data types for instance

my_name = c("My", "name", "is", "Vipin")
my_name[5] = "Singh" #insert at the end 
my_name
## [1] "My"    "name"  "is"    "Vipin" "Singh"

Combining the vectors to a single string. For instance the vector my_name = c("My", "name", "is", "Vipin") is combined to "My name is Vipin". The collapse argument is used as below;

print(paste(my_name, collapse=" "))
## [1] "My name is Vipin Singh"

Calculate the summary/descriptive statistics of the vector by function summary(). It finds;

  • Count/length

  • Class (data type)

  • Mode

summary(my_name)
##    Length     Class      Mode 
##         5 character character

2.2.1.3 Vectors with mixed data types

A vector can also consist of characters values and numeric values for instance

numbers=c(1,"two", 3, "three")
numbers
## [1] "1"     "two"   "3"     "three"

however the numeric elements in the vector are recognized by R as character data type. They can be converted to numeric by;

as.numeric(numbers[1]) + 2
## [1] 3

the integers can be converted by;

as.integer(numbers[1])
## [1] 1

2.2.1.4 Named Vectors

Variable names can be assigned to vectors like;

named_vector=c(EcoR1="GAATTC", HindIII="AAGCTT", Pst1="CTGCAG")
named_vector
##    EcoR1  HindIII     Pst1 
## "GAATTC" "AAGCTT" "CTGCAG"

to access the names of the values is;

names(named_vector)
## [1] "EcoR1"   "HindIII" "Pst1"

A vector element can be accessed using its name

named_vector["EcoR1"] # find the value of a vector by its name 
##    EcoR1 
## "GAATTC"

2.2.1.5 Generating number series as vectors

The seq function in R is used to generate sequences of numbers. It takes several arguments, including from, to, by, and length.out, among others, to specify the range and increment of the sequence. Here’s a brief overview of its usage:

  • from: The starting value of the sequence.
  • to: The end value of the sequence.
# Generate a sequence from 1 to 10
series = seq(from=1, to=20)
series 
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
# It can also be written as
series = seq(1,20)
series
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
class(series)
## [1] "integer"
  • by: The increment between consecutive values in the sequence.
# generate numbers 0 to 10 incremented by 0.5
series3=seq(0, 10, by=0.5)
series3
##  [1]  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0
## [16]  7.5  8.0  8.5  9.0  9.5 10.0
  • length: The desired length of the sequence.
# generate 10 numbers from 0 to 6
series4=seq(0, 6, length=10)
series4
##  [1] 0.0000000 0.6666667 1.3333333 2.0000000 2.6666667 3.3333333 4.0000000
##  [8] 4.6666667 5.3333333 6.0000000
seq(0, 6)
## [1] 0 1 2 3 4 5 6
  • along.with: An optional vector argument specifying the length and names of the output sequence.
# Generate a sequence along with a vector
seq(along.with = c("a", "b", "c"))
## [1] 1 2 3

2.2.1.6 Null data points in vectors

NA data (Not available or blank) for instance

marks=c(78,65, 98, 87, 89, NA)
sum(is.na(marks)) #Count the null values in a vector 
## [1] 1

Other inbuilt functions for mathematical operations cannot be done if Null values exists in a vector unless they are removed/ignored

#sum(marks) #returns an error
sum(marks, na.rm = TRUE) #remove null values before calculating the sum
## [1] 417
median(marks, na.rm = TRUE)
## [1] 87
summary(marks, na.rm = TRUE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    65.0    78.0    87.0    83.4    89.0    98.0       1

2.2.2 Matrix

A matrix is a two dimensional data type that contain a single class of data. The code below shows one can produce a matrix from a vector

vector1 = seq(1, 9) 

# Convert to matrix
## create by column 
data1=matrix(vector1, ncol=3)
data1
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

A vector of values 1 to 9 is being converted to a matrix where the values are being arranged column wise by default.

A matrix has a multiple dimensions, the most common type of matrix is two dimesnional.

vector1 = seq(1, 9) 

# Convert to matrix
## create by column 
data1=matrix(vector1, ncol=3)

# find the dimension of the vector
dim(data1)
## [1] 3 3

is.matrix() function is used to confirm if a given variable is a matrix and it return a boolean value.

vector1 = seq(1, 9) 

# Convert to matrix
## create by column 
data1=matrix(vector1, ncol=3)

# confirm if `data1` is really a matrix
is.matrix(data1)
## [1] TRUE

A matrix can also be created row-wise from a vector.

vector1 = seq(1, 9) 

## create a matrix by row 
data2=matrix(vector1, ncol=3, byrow=TRUE)
data2 # is a transpose of data1
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Matrix is recognized either as a matrix or array by R

vector1 = seq(1, 9) 

# Convert to matrix
## create by column 
data1=matrix(vector1, ncol=3)

# find the data type of `data1`
class(data1) 
## [1] "matrix" "array"

To access a specific data point in a matrix, the matrix is indexed by row then column for instance matrix_data[row_index, column_index]

vector1 = seq(1, 9) 

# Convert to matrix
## create by column 
data1=matrix(vector1, ncol=3)

# retrieve the value in the third row second column in `data1`
data1[3, 2]
## [1] 6

To access a single row, in this case we find the second row which will be returned as a vector

vector1 = seq(1, 9) 

# Convert to matrix
## create by column 
data1=matrix(vector1, ncol=3)

row2 = data1[2,] # access row 2 
is.vector(row2) #can be accessed by row 2
## [1] TRUE

To access a single column

vector1 = seq(1, 9) 

# Convert to matrix
## create by column 
data1=matrix(vector1, ncol=3)

col3=data1[,3] # access column 3
is.vector(col3) #can be accessed by column 3
## [1] TRUE

Count the number of rows in a matrix

vector1 = seq(1, 9) 

# Convert to matrix
## create by column 
data1=matrix(vector1, ncol=3)

nrow(data1)
## [1] 3

data1 has 3 rows

Count the number of columns in a matrix

vector1 = seq(1, 9) 

# Convert to matrix
## create by column 
data1=matrix(vector1, ncol=3)

ncol(data1)
## [1] 3

2.2.2.1 Mathematical Operations in a matrix

Matrix Addition Matrix addition can be done by adding a number to the matrix or another matrix of the equal number of rows and columns.

vector1 = seq(1, 9) 

# Convert to matrix
## create by column 
data1=matrix(vector1, ncol=3)

data2 = data1 + 3
data2
##      [,1] [,2] [,3]
## [1,]    4    7   10
## [2,]    5    8   11
## [3,]    6    9   12

For instance, the code snippet above demonstrates matrix addition by a numeric value. Adding value 3 to a matrix adds each value in the matrix by 3. To demonstrate a matrix to a matrix addition, we will create two matrices of the equal dimensions then add to each other.

data1 = matrix(seq(1, 9), ncol=3, byrow=TRUE)
print(data1)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
data2 = matrix(seq(1, 18, 2), ncol=3, byrow=TRUE)
print(data2)
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    7    9   11
## [3,]   13   15   17
# Add data1 to data2
resultant_matrix = data1 + data2
resultant_matrix
##      [,1] [,2] [,3]
## [1,]    2    5    8
## [2,]   11   14   17
## [3,]   20   23   26

Matrix Subtraction The same concept of matrix addition applies to matrix subtraction as well.

data1 = matrix(seq(1, 9), ncol=3, byrow=TRUE)

data3 = data1-1 #reduce each value by 1
data3
##      [,1] [,2] [,3]
## [1,]    0    1    2
## [2,]    3    4    5
## [3,]    6    7    8

Subtracting 1 to data1 subtract each value in the matrix by 1. Lets now subtract data1 from data2.

data1 = matrix(seq(1, 9), ncol=3, byrow=TRUE)
data1
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
data2 = matrix(seq(1, 18, 2), ncol=3, byrow=TRUE)
data2
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    7    9   11
## [3,]   13   15   17
resultant_matrix = data2-data1
resultant_matrix
##      [,1] [,2] [,3]
## [1,]    0    1    2
## [2,]    3    4    5
## [3,]    6    7    8

Matrix Multiplication(scalar) A matrix can be multiplied by a scalar whereby the scalar value multiplies all the cells in the matrix.

data1 = matrix(seq(1, 9), ncol=3, byrow=TRUE)
data1
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
data4 = data1*5
data4
##      [,1] [,2] [,3]
## [1,]    5   10   15
## [2,]   20   25   30
## [3,]   35   40   45

Matrix multiplication applies a concept of row by column. The row of the first matrix is multiplied with a row of the second matrix. It also known as the dot product.

data1 = matrix(seq(1, 9), ncol=3, byrow=TRUE)
data1
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
data2 = matrix(seq(1, 18, 2), ncol=3, byrow=TRUE)
data2
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    7    9   11
## [3,]   13   15   17
# Find the product of the two matrices
product_matrix = data1 * data2
product_matrix
##      [,1] [,2] [,3]
## [1,]    1    6   15
## [2,]   28   45   66
## [3,]   91  120  153

Matrix division

data1 = matrix(seq(1, 9), ncol=3, byrow=TRUE)
data1
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
# Divide `data1` matrix by 2
data5 = data1/2
data5
##      [,1] [,2] [,3]
## [1,]  0.5  1.0  1.5
## [2,]  2.0  2.5  3.0
## [3,]  3.5  4.0  4.5

2.2.3 Data frame

is a two dimensional data structure, like a 2d array/matrix with rows and columns.

Lets convert a matrix into a data frame

vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector 

# Adding a column student 
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1) 
data
##   Students X1 X2 X3 X4
## 1   Pragya  1  4  7 10
## 2  Deepika  2  5  8 11
## 3 Chandran  3  6  9 12

The above data shows scores of different students in different subjects. The column names are automatically generated by R, however, the column names can be added as below.

vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector 

# Adding a column student 
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1) 
data
##   Students X1 X2 X3 X4
## 1   Pragya  1  4  7 10
## 2  Deepika  2  5  8 11
## 3 Chandran  3  6  9 12
# Create column names
headers=c("Students", "Geonomics", "Proteomics", "Microbiology", "Biostatistics")
colnames(data)=headers #add column names 
data
##   Students Geonomics Proteomics Microbiology Biostatistics
## 1   Pragya         1          4            7            10
## 2  Deepika         2          5            8            11
## 3 Chandran         3          6            9            12

A row wise addition can be performed on a data frame to find the total scores for each student in the four units

vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector 

# Adding a column student 
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1) 
data
##   Students X1 X2 X3 X4
## 1   Pragya  1  4  7 10
## 2  Deepika  2  5  8 11
## 3 Chandran  3  6  9 12
## Add a new column with total marks obtained 
data$total_marks=rowSums(data[, c(2, 3, 4, 5)]) #add from second to fifth column 
data
##   Students X1 X2 X3 X4 total_marks
## 1   Pragya  1  4  7 10          22
## 2  Deepika  2  5  8 11          26
## 3 Chandran  3  6  9 12          30

Find the average score for each student.rowMeans() is used the average of each row/record.

vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector 

# Adding a column student 
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1) 
data
##   Students X1 X2 X3 X4
## 1   Pragya  1  4  7 10
## 2  Deepika  2  5  8 11
## 3 Chandran  3  6  9 12
data$average_marks=rowMeans(data[, c(2, 3, 4, 5)])
data # confirm if the new column is added
##   Students X1 X2 X3 X4 average_marks
## 1   Pragya  1  4  7 10           5.5
## 2  Deepika  2  5  8 11           6.5
## 3 Chandran  3  6  9 12           7.5

2.3 Hands-on Exercises

  1. Basic Data Types
  • Create an integer variable age with value 25.
  • Create a numeric variable height representing height in meters.
  • Define a string variable name with the value "Alex".
  • Create a boolean variable is_student indicating whether someone is a student or not.
  • Create a complex number variable z representing 2 + 3i
  • Define a raw data variable byte_value that stores the hexadecimal value 0x1a.
  1. Operators
  • Add two integers: 12 + 8.
  • Divide two numbers: 45.5 / 5.
  • Create a logical comparison to check if age is greater than 20.
  • Create a logical comparison to check if height is equal to 1.75.
  1. Vectors
  • Create a numeric vector numbers with values 2, 4, 6, 8, 10.
  • Create a character vector colors containing "red", "blue", "green", "yellow", "purple".
  • Append the value 12 to the vector numbers.
  1. Matrix
  • Create a 3x3 matrix A with values from 1 to 9.
  • Create another 3x3 matrix B with values from 9 to 1.
  1. Dataframes
  • Create a dataframe students_df with the columns Name, Age, and Grade for three students.
  • Add a new column Gender to the data frame students_df.
  1. Vector and Matrix Operations
  • Add the vectors c(2, 4, 6) and c(1, 3, 5).
  • Multiply the matrices `A and B from question 3.