Chapter 2 Basic Data Types and Structures
2.1 Data types
There are different kinds of values in R that can be manipulated in variables in R. class()function is used to check the data type of a value or a variable. Different data types include;
- Numeric
These represent numeric values such as integers and decimals. They are used for mathematical expressions and quantitative data analysis. The below code finds the data type of variable a which is assigned 23.5 and returns numeric.
## [1] "numeric"
a whole number without without a decimal is also numeric for instance 45, 8, 0 and 73. Run the code chunks below to inspect to find the code of each value
## [1] "numeric"
## [1] "numeric"
## [1] "numeric"
## [1] "numeric"
Practical Exercise
Answer the questions below;
- Find the data type of
98.03usingclass()function. - Assign the value
98.03to variableheightand find data type ofheight.
- Integers
They represent whole numbers without any any decimals and are a subclass of numeric. L is added at the end of a whole number to indicate that it is an integer.
## [1] "integer"
Lets store age as an integer. Note the ‘L’ after the number 27
## [1] "integer"
Practical Exercise
Answer the questions below;
- Find the data type of any whole number using
class()function. Remember to addLafter the digits - There are 27 goats in a field, assign the quantity of goats to a variable
goatsand find the data type of the variablegoats.
- Characters
They represent text strings such as names, sentences and labels. They are enclosed in ” or ’.
## [1] "character"
Lets use name as a character
## [1] "character"
for an object
## [1] "character"
Character data types can have empty spaces in between, for instance;
## [1] "character"
Practical Exercise
In the code cell below;
- Find the data type of the value
"school"using theclass()function. - Assign your first name to a variable
firstnameand find its data type. Remember to enclose it in quotation marks - Assign your full names to a variable
full_nameand find its data type. For instance if your name is “Vipin Patel” assign it like;full_name = "Vipin Patel"and find its data type. Remember to enclose the value in quotation marks since its a character data type
- Logical
They represent boolean values which has only distinct value; TRUE or FALSE.
## [1] "logical"
changing it to FALSE
## [1] "logical"
Practical Exercise
Assign a TRUE to a variable grateful and find the data type of the variable.
- Complex
They represent complex numbers with real and imaginary parts
## [1] "complex"
2 is the real part while 3i is the imaginary part. Also, complex numbers can be created by complex() function with real and imaginary as the arguments.
## [1] 3+7i
## [1] "complex"
Lets try another values to fit to the complex data type
2+5i
## [1] 2+5i
## [1] "complex"
7 + 6i
## [1] 7+6i
## [1] "complex"
4i - 1
## [1] -1+4i
## [1] "complex"
Complex data types can include the imaginary part only without real number, R will assume the real part to be 0(zero). For instance;
## [1] 0+3i
## [1] "complex"
Practical Exercise
Find the data type of the following values; One of them is a numeric element
3i + 85 - 1i4i12
- Raw
They represent a vector of bytes in their natural form. They are used in storing binary data. Example;
## [1] 44 4e 41
## [1] "raw"
## [1] "character"
“Hello world” can be represented as in the results below when converted to raw data type
## [1] 48 65 6c 6c 6f 20 57 6f 72 6c 64
## [1] "raw"
Numeric can also be represented as raw vectors;
## [1] 1b
## [1] "raw"
Practical Exercise
Convert the following values to raw data types; Hint: use charToRaw() function for character data types and as.raw() to other data types.
"Vipin"2769.0FALSE12L
2.2 Data Structures
This is the organization of data into one or multiple data values in specific structures. Different types of data structures in R include;
Vector
Matrix
Data frame
2.2.1 Vector
A vector is a single entity consisting of a collection of things. They are versatile providing a basis of many operations in statistics and data manipulation hence it is important to have knowledge of vectors for effective programming in R. Vectors are created using a c() function, here is an example of a vector.
## [1] 23 67 98 34 98 21
Practical Exercise
Create a vector named ages and insert the following values 21, 32, 22, 24, 27, 54, 20, 13 and print it out on the console
The class function is utilized to determine the data types present within vector data values.
## [1] "numeric"
The vector “marks” consist of only numeric values
is.vector function is used to check if the variable is a vector. It will return a Boolean value, TRUE if the variable in question is truly a vector while FALSE if otherwise.
## [1] TRUE
unlike matrix and data frame, vector has no dimension
## NULL
length() function is used to count number of elements in vectors. In our case vector marks, marks = c(23, 67, 98, 34, 98, 21) has six elements, therefore, length() command will return 6.
## [1] 6
Practical Exercise
Create a vector named height with its elements/values as 120.1, 118, 123.4, 130.8, 115.2 and do the following;
- print it out to the console using
print()function. - find the data type of its elements using
class()function - use
is.vector()function to find if its really a vector - count the number of elements in the vector using
length()function.
Index is the position of an element in a vector, in R it starts at index 1 - lets say we find the third element by index 3
## [1] 98
value “98” is at index 3, or the third in the vector. The first value/element of a vector is indexed 1, for instance if we find the first value in the vector marks.
## [1] 23
The sequence goes on, the second, third, fourth, fifth … values are indexed as , 2, 3, 4, 5… respectively. i.e the n^th value is indexed as n.
Vectors can also be sliced to obtain values over a range of indices. For instance the code below shows how to retrieve the from the second to the fourth values as a vector
## [1] 67 98 34
## [1] TRUE
An element at a specific index in a vector can be excluded by adding a - sign before the index value.
## [1] 23 98 34 98 21
rev() command is used to reverse the order of elements in a vector
## [1] 21 98 34 98 67 23
Practical Exercise
Create a vector named ages and insert the following values; 13, 59, 27, 22, 19, 31, 43. Use it to answer the questions below.
- Print out the vector
agesto the console - Store the third element in a variable called
my_ageand print it out. - Extract the values from the second to the fifth element and print them out.
- Exclude the third element
- Reverse the order of the elements in the vector.
2.2.1.1 Mathematical Operations in a vector
The summary/descriptive statistics are calculated by summary() command.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21.00 25.75 50.50 56.83 90.25 98.00
sum(), median(), and mean() are used to calculate the total, median, average and the standard deviation of the values in a vector
## [1] "MARKS"
## [1] "TOTAL: 341"
## [1] "MEDIAN: 50.5"
## [1] "AVERAGE: 56.8333333333333"
- Vector multiplication and division - vectors can be multiplied or divided by a scalar value of another vector of the same length and numeric data type. For instance, the vector
marks=c(23, 67, 98, 34, 98, 21)is being multiplied by a scalar value2, that will multiply each element in a vector by two.
marks = c(23, 67, 98, 34, 98, 21)
# Multiply each element in the vector by 2
double_marks =2 * marks
marks## [1] 23 67 98 34 98 21
## [1] 46 134 196 68 196 42
The values in the vector marks can also be scaled down to a half when multiplied by a scalar value 0.5.
marks = c(23, 67, 98, 34, 98, 21)
# Multiply by 0.5 to scale the marks by a half
half_marks =0.5 * marks
marks## [1] 23 67 98 34 98 21
## [1] 11.5 33.5 49.0 17.0 49.0 10.5
Alternatively, instead of multiplying the vector by 0.5, it can be divided by 2 a scalar value two. This is what is referred to as vector division.
marks = c(23, 67, 98, 34, 98, 21)
# Scale down the marks by a half by dividing by 2 instead of multiplying by 0.5
half_marks = marks/2
half_marks## [1] 11.5 33.5 49.0 17.0 49.0 10.5
Practical Exercise
Create a vector with the following values; 67, 55, 60, 59, 57.2, 71, 62, 66, 70 and name the vector weights. Use the variable weights to solve the following problems
- Calculate the; i. median weight ii. mean(average) weight iii. the total weight when summed together
- Calculate the summary statistics using the
summary()function. - Add 10 to variable
weightsand the answeradded_weights. - Subtract 15 to
weightsand name itreduced_weights. - Scale the weights by multiplying the vector by 1.5. ’
- Scale down the weights to a third by dividing the vector by 3.
Vector by vector multiplication and division
Two or more vectors of numeric values of the equal length can be multiplied or divided by each other. The example below demonstrates vector by vector multiplication of vector a; 3, 5, 1 and vector b: 7, 3, 9. Each value is multiplied by a value of a corresponding index in the next vector such that;
3is multiplied by7to be215is multiplied by3to be151is multiplied by9to be9.
The resultant vector is now 21 15 9.
## [1] 21 15 9
## [1] 21 15 9
The same vectors can also be divided by each other provided they are of the same length and all have numeric values. The order of vector division, for instance in the first case vector a is divided by vector b such that;
3is divided by7to be0.42857145is divided by3to be1.66666671is divided by9to be0.1111111.
The resultant vector is now 0.4285714 1.6666667 0.1111111.
## [1] 0.4285714 1.6666667 0.1111111
, and in the second case the order of vector division is reversed by vector b being divided by a (b/a instead of a/b) such that;
7is divided by3to be2.3333333is divided by5to be0.6000009is divided by1to be9.000000.
The resultant vector is now 2.333333 0.600000 9.000000.
## [1] 2.333333 0.600000 9.000000
However, when multiplying vectors of unequal length the shorter one is replicated to match the longer vector. It will then return a warning. The case below shows how vector e=c(1,2,3,4,5) and f=c(1,2) are multiplied.
- vector
f=c(1,2)will be replicated to match the length of vectore, therefore, vectorfwill bef=c(1,2,1,2,1). The process of vector by vector multiplication will be followed.
## Warning in e * f: longer object length is not a multiple of shorter object
## length
## [1] 1 4 3 8 5
Multiple vectors can be concatenated/combined to come up with one giant vector
## [1] 3 5 1
## [1] 7 3 9
## [1] 3 5 1 7 3 9 3 5 1
Practical Exercise
Create two vectors, vector1;4, 6, 12, 7 and vector2:7, 3, 5, 10. Use the two vectors to solve the following questions.
- Create
vector3by multiplyingvector1andvector2. Print it out. - Create
vector4aby divingvector1byvector2. Print it out. - Create
vector4bby dividingvector2byvector1. Print it out. - Is there a difference between
vector4aandvector4b? If there is, what brought the difference? Write the answer as a comment. - Create another
vector5;4, 6and multiply it withvector1to come up withvector6. Print it out. - Concatenate
vector1,vector2andvector5to come up with agiant_vector. Print it out.
2.2.1.2 Character Vectors
Vectors can also contain character data types for instance
## [1] "My" "name" "is" "Vipin" "Singh"
Combining the vectors to a single string. For instance the vector my_name = c("My", "name", "is", "Vipin") is combined to "My name is Vipin". The collapse argument is used as below;
## [1] "My name is Vipin Singh"
Calculate the summary/descriptive statistics of the vector by function summary(). It finds;
Count/length
Class (data type)
Mode
## Length Class Mode
## 5 character character
2.2.1.3 Vectors with mixed data types
A vector can also consist of characters values and numeric values for instance
## [1] "1" "two" "3" "three"
however the numeric elements in the vector are recognized by R as character data type. They can be converted to numeric by;
## [1] 3
the integers can be converted by;
## [1] 1
2.2.1.4 Named Vectors
Variable names can be assigned to vectors like;
## EcoR1 HindIII Pst1
## "GAATTC" "AAGCTT" "CTGCAG"
to access the names of the values is;
## [1] "EcoR1" "HindIII" "Pst1"
A vector element can be accessed using its name
## EcoR1
## "GAATTC"
2.2.1.5 Generating number series as vectors
The seq function in R is used to generate sequences of numbers. It takes several arguments, including from, to, by, and length.out, among others, to specify the range and increment of the sequence. Here’s a brief overview of its usage:
from: The starting value of the sequence.to: The end value of the sequence.
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## [1] "integer"
by: The increment between consecutive values in the sequence.
## [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
## [16] 7.5 8.0 8.5 9.0 9.5 10.0
length: The desired length of the sequence.
## [1] 0.0000000 0.6666667 1.3333333 2.0000000 2.6666667 3.3333333 4.0000000
## [8] 4.6666667 5.3333333 6.0000000
## [1] 0 1 2 3 4 5 6
along.with: An optional vector argument specifying the length and names of the output sequence.
## [1] 1 2 3
2.2.1.6 Null data points in vectors
NA data (Not available or blank) for instance
## [1] 1
Other inbuilt functions for mathematical operations cannot be done if Null values exists in a vector unless they are removed/ignored
#sum(marks) #returns an error
sum(marks, na.rm = TRUE) #remove null values before calculating the sum## [1] 417
## [1] 87
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 65.0 78.0 87.0 83.4 89.0 98.0 1
2.2.2 Matrix
A matrix is a two dimensional data type that contain a single class of data. The code below shows one can produce a matrix from a vector
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
A vector of values 1 to 9 is being converted to a matrix where the values are being arranged column wise by default.
A matrix has a multiple dimensions, the most common type of matrix is two dimesnional.
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
# find the dimension of the vector
dim(data1)## [1] 3 3
is.matrix() function is used to confirm if a given variable is a matrix and it return a boolean value.
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
# confirm if `data1` is really a matrix
is.matrix(data1)## [1] TRUE
A matrix can also be created row-wise from a vector.
vector1 = seq(1, 9)
## create a matrix by row
data2=matrix(vector1, ncol=3, byrow=TRUE)
data2 # is a transpose of data1## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
Matrix is recognized either as a matrix or array by R
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
# find the data type of `data1`
class(data1) ## [1] "matrix" "array"
To access a specific data point in a matrix, the matrix is indexed by row then column for instance matrix_data[row_index, column_index]
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
# retrieve the value in the third row second column in `data1`
data1[3, 2]## [1] 6
To access a single row, in this case we find the second row which will be returned as a vector
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
row2 = data1[2,] # access row 2
is.vector(row2) #can be accessed by row 2## [1] TRUE
To access a single column
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
col3=data1[,3] # access column 3
is.vector(col3) #can be accessed by column 3## [1] TRUE
Count the number of rows in a matrix
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
nrow(data1)## [1] 3
data1 has 3 rows
Count the number of columns in a matrix
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
ncol(data1)## [1] 3
2.2.2.1 Mathematical Operations in a matrix
Matrix Addition Matrix addition can be done by adding a number to the matrix or another matrix of the equal number of rows and columns.
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
data2 = data1 + 3
data2## [,1] [,2] [,3]
## [1,] 4 7 10
## [2,] 5 8 11
## [3,] 6 9 12
For instance, the code snippet above demonstrates matrix addition by a numeric value. Adding value 3 to a matrix adds each value in the matrix by 3. To demonstrate a matrix to a matrix addition, we will create two matrices of the equal dimensions then add to each other.
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 7 9 11
## [3,] 13 15 17
## [,1] [,2] [,3]
## [1,] 2 5 8
## [2,] 11 14 17
## [3,] 20 23 26
Matrix Subtraction The same concept of matrix addition applies to matrix subtraction as well.
## [,1] [,2] [,3]
## [1,] 0 1 2
## [2,] 3 4 5
## [3,] 6 7 8
Subtracting 1 to data1 subtract each value in the matrix by 1. Lets now subtract data1 from data2.
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 7 9 11
## [3,] 13 15 17
## [,1] [,2] [,3]
## [1,] 0 1 2
## [2,] 3 4 5
## [3,] 6 7 8
Matrix Multiplication(scalar) A matrix can be multiplied by a scalar whereby the scalar value multiplies all the cells in the matrix.
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [,1] [,2] [,3]
## [1,] 5 10 15
## [2,] 20 25 30
## [3,] 35 40 45
Matrix multiplication applies a concept of row by column. The row of the first matrix is multiplied with a row of the second matrix. It also known as the dot product.
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 7 9 11
## [3,] 13 15 17
## [,1] [,2] [,3]
## [1,] 1 6 15
## [2,] 28 45 66
## [3,] 91 120 153
Matrix division
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [,1] [,2] [,3]
## [1,] 0.5 1.0 1.5
## [2,] 2.0 2.5 3.0
## [3,] 3.5 4.0 4.5
2.2.3 Data frame
is a two dimensional data structure, like a 2d array/matrix with rows and columns.
Lets convert a matrix into a data frame
vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector
# Adding a column student
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1)
data## Students X1 X2 X3 X4
## 1 Pragya 1 4 7 10
## 2 Deepika 2 5 8 11
## 3 Chandran 3 6 9 12
The above data shows scores of different students in different subjects. The column names are automatically generated by R, however, the column names can be added as below.
vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector
# Adding a column student
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1)
data## Students X1 X2 X3 X4
## 1 Pragya 1 4 7 10
## 2 Deepika 2 5 8 11
## 3 Chandran 3 6 9 12
# Create column names
headers=c("Students", "Geonomics", "Proteomics", "Microbiology", "Biostatistics")
colnames(data)=headers #add column names
data## Students Geonomics Proteomics Microbiology Biostatistics
## 1 Pragya 1 4 7 10
## 2 Deepika 2 5 8 11
## 3 Chandran 3 6 9 12
A row wise addition can be performed on a data frame to find the total scores for each student in the four units
vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector
# Adding a column student
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1)
data## Students X1 X2 X3 X4
## 1 Pragya 1 4 7 10
## 2 Deepika 2 5 8 11
## 3 Chandran 3 6 9 12
## Add a new column with total marks obtained
data$total_marks=rowSums(data[, c(2, 3, 4, 5)]) #add from second to fifth column
data## Students X1 X2 X3 X4 total_marks
## 1 Pragya 1 4 7 10 22
## 2 Deepika 2 5 8 11 26
## 3 Chandran 3 6 9 12 30
Find the average score for each student.rowMeans() is used the average of each row/record.
vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector
# Adding a column student
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1)
data## Students X1 X2 X3 X4
## 1 Pragya 1 4 7 10
## 2 Deepika 2 5 8 11
## 3 Chandran 3 6 9 12
## Students X1 X2 X3 X4 average_marks
## 1 Pragya 1 4 7 10 5.5
## 2 Deepika 2 5 8 11 6.5
## 3 Chandran 3 6 9 12 7.5
2.3 Hands-on Exercises
- Basic Data Types
- Create an integer variable
agewith value25. - Create a numeric variable
heightrepresenting height in meters. - Define a string variable
namewith the value"Alex". - Create a boolean variable
is_studentindicating whether someone is a student or not. - Create a complex number variable
zrepresenting2 + 3i - Define a raw data variable byte_value that stores the hexadecimal value
0x1a.
- Operators
- Add two integers:
12 + 8. - Divide two numbers:
45.5 / 5. - Create a logical comparison to check if
ageis greater than20. - Create a logical comparison to check if
heightis equal to1.75.
- Vectors
- Create a numeric vector numbers with values
2, 4, 6, 8, 10. - Create a character vector
colorscontaining"red", "blue", "green", "yellow", "purple". - Append the value
12to the vector numbers.
- Matrix
- Create a 3x3 matrix
Awith values from1to9. - Create another 3x3 matrix
Bwith values from9to1.
- Dataframes
- Create a dataframe
students_dfwith the columnsName,Age, andGradefor three students. - Add a new column
Genderto the data framestudents_df.
- Vector and Matrix Operations
- Add the vectors
c(2, 4, 6)andc(1, 3, 5). - Multiply the matrices `
AandBfrom question 3.