Chapter 2 Basic Data Types and Structures
2.1 Data types
There are different kinds of values in R that can be manipulated in variables in R. class()
function is used to check the data type of a value or a variable. Different data types include;
- Numeric
These represent numeric values such as integers and decimals. They are used for mathematical expressions and quantitative data analysis. The below code finds the data type of variable a
which is assigned 23.5
and returns numeric.
## [1] "numeric"
a whole number without without a decimal is also numeric for instance 45, 8, 0 and 73. Run the code chunks below to inspect to find the code of each value
## [1] "numeric"
## [1] "numeric"
## [1] "numeric"
## [1] "numeric"
Practical Exercise
Answer the questions below;
- Find the data type of
98.03
usingclass()
function. - Assign the value
98.03
to variableheight
and find data type ofheight
.
- Integers
They represent whole numbers without any any decimals and are a subclass of numeric. L
is added at the end of a whole number to indicate that it is an integer.
## [1] "integer"
Lets store age
as an integer. Note the ‘L’ after the number 27
## [1] "integer"
Practical Exercise
Answer the questions below;
- Find the data type of any whole number using
class()
function. Remember to addL
after the digits - There are 27 goats in a field, assign the quantity of goats to a variable
goats
and find the data type of the variablegoats
.
- Characters
They represent text strings such as names, sentences and labels. They are enclosed in ” or ’.
## [1] "character"
Lets use name as a character
## [1] "character"
for an object
## [1] "character"
Character data types can have empty spaces in between, for instance;
## [1] "character"
Practical Exercise
In the code cell below;
- Find the data type of the value
"school"
using theclass()
function. - Assign your first name to a variable
firstname
and find its data type. Remember to enclose it in quotation marks - Assign your full names to a variable
full_name
and find its data type. For instance if your name is “Vipin Patel” assign it like;full_name = "Vipin Patel"
and find its data type. Remember to enclose the value in quotation marks since its a character data type
- Logical
They represent boolean values which has only distinct value; TRUE
or FALSE
.
## [1] "logical"
changing it to FALSE
## [1] "logical"
Practical Exercise
Assign a TRUE
to a variable grateful
and find the data type of the variable.
- Complex
They represent complex numbers with real and imaginary parts
## [1] "complex"
2
is the real part while 3i
is the imaginary part. Also, complex numbers can be created by complex()
function with real
and imaginary
as the arguments.
## [1] 3+7i
## [1] "complex"
Lets try another values to fit to the complex data type
2+5i
## [1] 2+5i
## [1] "complex"
7 + 6i
## [1] 7+6i
## [1] "complex"
4i - 1
## [1] -1+4i
## [1] "complex"
Complex data types can include the imaginary part only without real number, R will assume the real part to be 0
(zero). For instance;
## [1] 0+3i
## [1] "complex"
Practical Exercise
Find the data type of the following values; One of them is a numeric element
3i + 8
5 - 1i
4i
12
- Raw
They represent a vector of bytes in their natural form. They are used in storing binary data. Example;
## [1] 44 4e 41
## [1] "raw"
## [1] "character"
“Hello world” can be represented as in the results below when converted to raw data type
## [1] 48 65 6c 6c 6f 20 57 6f 72 6c 64
## [1] "raw"
Numeric can also be represented as raw vectors;
## [1] 1b
## [1] "raw"
Practical Exercise
Convert the following values to raw data types; Hint: use charToRaw()
function for character data types and as.raw()
to other data types.
"Vipin"
27
69.0
FALSE
12L
2.2 Data Structures
This is the organization of data into one or multiple data values in specific structures. Different types of data structures in R include;
Vector
Matrix
Data frame
2.2.1 Vector
A vector is a single entity consisting of a collection of things. They are versatile providing a basis of many operations in statistics and data manipulation hence it is important to have knowledge of vectors for effective programming in R. Vectors are created using a c()
function, here is an example of a vector.
## [1] 23 67 98 34 98 21
Practical Exercise
Create a vector named ages
and insert the following values 21, 32, 22, 24, 27, 54, 20, 13
and print it out on the console
The class
function is utilized to determine the data types present within vector data values.
## [1] "numeric"
The vector “marks” consist of only numeric values
is.vector
function is used to check if the variable is a vector. It will return a Boolean value, TRUE
if the variable in question is truly a vector while FALSE
if otherwise.
## [1] TRUE
unlike matrix and data frame, vector has no dimension
## NULL
length()
function is used to count number of elements in vectors. In our case vector marks, marks = c(23, 67, 98, 34, 98, 21)
has six elements, therefore, length()
command will return 6
.
## [1] 6
Practical Exercise
Create a vector named height
with its elements/values as 120.1, 118, 123.4, 130.8, 115.2
and do the following;
- print it out to the console using
print()
function. - find the data type of its elements using
class()
function - use
is.vector()
function to find if its really a vector - count the number of elements in the vector using
length()
function.
Index is the position of an element in a vector, in R it starts at index 1 - lets say we find the third element by index 3
## [1] 98
value “98” is at index 3, or the third in the vector. The first value/element of a vector is indexed 1
, for instance if we find the first value in the vector marks
.
## [1] 23
The sequence goes on, the second, third, fourth, fifth … values are indexed as , 2
, 3
, 4
, 5
… respectively. i.e the n^th value is indexed as n
.
Vectors can also be sliced to obtain values over a range of indices. For instance the code below shows how to retrieve the from the second to the fourth values as a vector
## [1] 67 98 34
## [1] TRUE
An element at a specific index in a vector can be excluded by adding a -
sign before the index value.
## [1] 23 98 34 98 21
rev()
command is used to reverse the order of elements in a vector
## [1] 21 98 34 98 67 23
Practical Exercise
Create a vector named ages
and insert the following values; 13, 59, 27, 22, 19, 31, 43
. Use it to answer the questions below.
- Print out the vector
ages
to the console - Store the third element in a variable called
my_age
and print it out. - Extract the values from the second to the fifth element and print them out.
- Exclude the third element
- Reverse the order of the elements in the vector.
2.2.1.1 Mathematical Operations in a vector
The summary/descriptive statistics are calculated by summary()
command.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21.00 25.75 50.50 56.83 90.25 98.00
sum()
, median()
, and mean()
are used to calculate the total, median, average and the standard deviation of the values in a vector
## [1] "MARKS"
## [1] "TOTAL: 341"
## [1] "MEDIAN: 50.5"
## [1] "AVERAGE: 56.8333333333333"
- Vector multiplication and division - vectors can be multiplied or divided by a scalar value of another vector of the same length and numeric data type. For instance, the vector
marks=c(23, 67, 98, 34, 98, 21)
is being multiplied by a scalar value2
, that will multiply each element in a vector by two.
marks = c(23, 67, 98, 34, 98, 21)
# Multiply each element in the vector by 2
double_marks =2 * marks
marks
## [1] 23 67 98 34 98 21
## [1] 46 134 196 68 196 42
The values in the vector marks
can also be scaled down to a half when multiplied by a scalar value 0.5
.
marks = c(23, 67, 98, 34, 98, 21)
# Multiply by 0.5 to scale the marks by a half
half_marks =0.5 * marks
marks
## [1] 23 67 98 34 98 21
## [1] 11.5 33.5 49.0 17.0 49.0 10.5
Alternatively, instead of multiplying the vector by 0.5
, it can be divided by 2
a scalar value two. This is what is referred to as vector division.
marks = c(23, 67, 98, 34, 98, 21)
# Scale down the marks by a half by dividing by 2 instead of multiplying by 0.5
half_marks = marks/2
half_marks
## [1] 11.5 33.5 49.0 17.0 49.0 10.5
Practical Exercise
Create a vector with the following values; 67, 55, 60, 59, 57.2, 71, 62, 66, 70
and name the vector weights
. Use the variable weights
to solve the following problems
- Calculate the; i. median weight ii. mean(average) weight iii. the total weight when summed together
- Calculate the summary statistics using the
summary()
function. - Add 10 to variable
weights
and the answeradded_weights
. - Subtract 15 to
weights
and name itreduced_weights
. - Scale the weights by multiplying the vector by 1.5. ’
- Scale down the weights to a third by dividing the vector by 3.
Vector by vector multiplication and division
Two or more vectors of numeric values of the equal length can be multiplied or divided by each other. The example below demonstrates vector by vector multiplication of vector a
; 3, 5, 1
and vector b
: 7, 3, 9
. Each value is multiplied by a value of a corresponding index in the next vector such that;
3
is multiplied by7
to be21
5
is multiplied by3
to be15
1
is multiplied by9
to be9
.
The resultant vector is now 21 15 9
.
## [1] 21 15 9
## [1] 21 15 9
The same vectors can also be divided by each other provided they are of the same length and all have numeric values. The order of vector division, for instance in the first case vector a
is divided by vector b
such that;
3
is divided by7
to be0.4285714
5
is divided by3
to be1.6666667
1
is divided by9
to be0.1111111
.
The resultant vector is now 0.4285714 1.6666667 0.1111111
.
## [1] 0.4285714 1.6666667 0.1111111
, and in the second case the order of vector division is reversed by vector b
being divided by a
(b/a
instead of a/b
) such that;
7
is divided by3
to be2.333333
3
is divided by5
to be0.600000
9
is divided by1
to be9.000000
.
The resultant vector is now 2.333333 0.600000 9.000000
.
## [1] 2.333333 0.600000 9.000000
However, when multiplying vectors of unequal length the shorter one is replicated to match the longer vector. It will then return a warning. The case below shows how vector e=c(1,2,3,4,5)
and f=c(1,2)
are multiplied.
- vector
f=c(1,2)
will be replicated to match the length of vectore
, therefore, vectorf
will bef=c(1,2,1,2,1)
. The process of vector by vector multiplication will be followed.
## Warning in e * f: longer object length is not a multiple of shorter object
## length
## [1] 1 4 3 8 5
Multiple vectors can be concatenated/combined to come up with one giant vector
## [1] 3 5 1
## [1] 7 3 9
## [1] 3 5 1 7 3 9 3 5 1
Practical Exercise
Create two vectors, vector1
;4, 6, 12, 7
and vector2
:7, 3, 5, 10
. Use the two vectors to solve the following questions.
- Create
vector3
by multiplyingvector1
andvector2
. Print it out. - Create
vector4a
by divingvector1
byvector2
. Print it out. - Create
vector4b
by dividingvector2
byvector1
. Print it out. - Is there a difference between
vector4a
andvector4b
? If there is, what brought the difference? Write the answer as a comment. - Create another
vector5
;4, 6
and multiply it withvector1
to come up withvector6
. Print it out. - Concatenate
vector1
,vector2
andvector5
to come up with agiant_vector
. Print it out.
2.2.1.2 Character Vectors
Vectors can also contain character data types for instance
## [1] "My" "name" "is" "Vipin" "Singh"
Combining the vectors to a single string. For instance the vector my_name = c("My", "name", "is", "Vipin")
is combined to "My name is Vipin"
. The collapse
argument is used as below;
## [1] "My name is Vipin Singh"
Calculate the summary/descriptive statistics of the vector by function summary()
. It finds;
Count/length
Class (data type)
Mode
## Length Class Mode
## 5 character character
2.2.1.3 Vectors with mixed data types
A vector can also consist of characters values and numeric values for instance
## [1] "1" "two" "3" "three"
however the numeric elements in the vector are recognized by R as character data type. They can be converted to numeric by;
## [1] 3
the integers can be converted by;
## [1] 1
2.2.1.4 Named Vectors
Variable names can be assigned to vectors like;
## EcoR1 HindIII Pst1
## "GAATTC" "AAGCTT" "CTGCAG"
to access the names of the values is;
## [1] "EcoR1" "HindIII" "Pst1"
A vector element can be accessed using its name
## EcoR1
## "GAATTC"
2.2.1.5 Generating number series as vectors
The seq
function in R is used to generate sequences of numbers. It takes several arguments, including from
, to
, by
, and length.out
, among others, to specify the range and increment of the sequence. Here’s a brief overview of its usage:
from
: The starting value of the sequence.to
: The end value of the sequence.
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## [1] "integer"
by
: The increment between consecutive values in the sequence.
## [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
## [16] 7.5 8.0 8.5 9.0 9.5 10.0
length
: The desired length of the sequence.
## [1] 0.0000000 0.6666667 1.3333333 2.0000000 2.6666667 3.3333333 4.0000000
## [8] 4.6666667 5.3333333 6.0000000
## [1] 0 1 2 3 4 5 6
along.with
: An optional vector argument specifying the length and names of the output sequence.
## [1] 1 2 3
2.2.1.6 Null data points in vectors
NA data (Not available or blank) for instance
## [1] 1
Other inbuilt functions for mathematical operations cannot be done if Null values exists in a vector unless they are removed/ignored
#sum(marks) #returns an error
sum(marks, na.rm = TRUE) #remove null values before calculating the sum
## [1] 417
## [1] 87
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 65.0 78.0 87.0 83.4 89.0 98.0 1
2.2.2 Matrix
A matrix is a two dimensional data type that contain a single class of data. The code below shows one can produce a matrix from a vector
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
A vector of values 1 to 9 is being converted to a matrix where the values are being arranged column wise by default.
A matrix has a multiple dimensions, the most common type of matrix is two dimesnional.
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
# find the dimension of the vector
dim(data1)
## [1] 3 3
is.matrix()
function is used to confirm if a given variable is a matrix and it return a boolean value.
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
# confirm if `data1` is really a matrix
is.matrix(data1)
## [1] TRUE
A matrix can also be created row-wise from a vector.
vector1 = seq(1, 9)
## create a matrix by row
data2=matrix(vector1, ncol=3, byrow=TRUE)
data2 # is a transpose of data1
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
Matrix is recognized either as a matrix or array by R
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
# find the data type of `data1`
class(data1)
## [1] "matrix" "array"
To access a specific data point in a matrix, the matrix is indexed by row then column for instance matrix_data[row_index, column_index]
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
# retrieve the value in the third row second column in `data1`
data1[3, 2]
## [1] 6
To access a single row, in this case we find the second row which will be returned as a vector
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
row2 = data1[2,] # access row 2
is.vector(row2) #can be accessed by row 2
## [1] TRUE
To access a single column
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
col3=data1[,3] # access column 3
is.vector(col3) #can be accessed by column 3
## [1] TRUE
Count the number of rows in a matrix
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
nrow(data1)
## [1] 3
data1
has 3 rows
Count the number of columns in a matrix
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
ncol(data1)
## [1] 3
2.2.2.1 Mathematical Operations in a matrix
Matrix Addition Matrix addition can be done by adding a number to the matrix or another matrix of the equal number of rows and columns.
vector1 = seq(1, 9)
# Convert to matrix
## create by column
data1=matrix(vector1, ncol=3)
data2 = data1 + 3
data2
## [,1] [,2] [,3]
## [1,] 4 7 10
## [2,] 5 8 11
## [3,] 6 9 12
For instance, the code snippet above demonstrates matrix addition by a numeric value. Adding value 3 to a matrix adds each value in the matrix by 3. To demonstrate a matrix to a matrix addition, we will create two matrices of the equal dimensions then add to each other.
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 7 9 11
## [3,] 13 15 17
## [,1] [,2] [,3]
## [1,] 2 5 8
## [2,] 11 14 17
## [3,] 20 23 26
Matrix Subtraction The same concept of matrix addition applies to matrix subtraction as well.
## [,1] [,2] [,3]
## [1,] 0 1 2
## [2,] 3 4 5
## [3,] 6 7 8
Subtracting 1 to data1
subtract each value in the matrix by 1. Lets now subtract data1 from data2.
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 7 9 11
## [3,] 13 15 17
## [,1] [,2] [,3]
## [1,] 0 1 2
## [2,] 3 4 5
## [3,] 6 7 8
Matrix Multiplication(scalar) A matrix can be multiplied by a scalar whereby the scalar value multiplies all the cells in the matrix.
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [,1] [,2] [,3]
## [1,] 5 10 15
## [2,] 20 25 30
## [3,] 35 40 45
Matrix multiplication applies a concept of row by column. The row of the first matrix is multiplied with a row of the second matrix. It also known as the dot product.
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 7 9 11
## [3,] 13 15 17
## [,1] [,2] [,3]
## [1,] 1 6 15
## [2,] 28 45 66
## [3,] 91 120 153
Matrix division
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [,1] [,2] [,3]
## [1,] 0.5 1.0 1.5
## [2,] 2.0 2.5 3.0
## [3,] 3.5 4.0 4.5
2.2.3 Data frame
is a two dimensional data structure, like a 2d array/matrix with rows and columns.
Lets convert a matrix into a data frame
vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector
# Adding a column student
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1)
data
## Students X1 X2 X3 X4
## 1 Pragya 1 4 7 10
## 2 Deepika 2 5 8 11
## 3 Chandran 3 6 9 12
The above data shows scores of different students in different subjects. The column names are automatically generated by R, however, the column names can be added as below.
vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector
# Adding a column student
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1)
data
## Students X1 X2 X3 X4
## 1 Pragya 1 4 7 10
## 2 Deepika 2 5 8 11
## 3 Chandran 3 6 9 12
# Create column names
headers=c("Students", "Geonomics", "Proteomics", "Microbiology", "Biostatistics")
colnames(data)=headers #add column names
data
## Students Geonomics Proteomics Microbiology Biostatistics
## 1 Pragya 1 4 7 10
## 2 Deepika 2 5 8 11
## 3 Chandran 3 6 9 12
A row wise addition can be performed on a data frame to find the total scores for each student in the four units
vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector
# Adding a column student
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1)
data
## Students X1 X2 X3 X4
## 1 Pragya 1 4 7 10
## 2 Deepika 2 5 8 11
## 3 Chandran 3 6 9 12
## Add a new column with total marks obtained
data$total_marks=rowSums(data[, c(2, 3, 4, 5)]) #add from second to fifth column
data
## Students X1 X2 X3 X4 total_marks
## 1 Pragya 1 4 7 10 22
## 2 Deepika 2 5 8 11 26
## 3 Chandran 3 6 9 12 30
Find the average score for each student.rowMeans()
is used the average of each row/record.
vector1 = c(1:12)
matrix1 = matrix(vector1, ncol=4) #create a matrix from the vector
# Adding a column student
Students=c("Pragya", "Deepika", "Chandran")
data = data.frame(Students, matrix1)
data
## Students X1 X2 X3 X4
## 1 Pragya 1 4 7 10
## 2 Deepika 2 5 8 11
## 3 Chandran 3 6 9 12
## Students X1 X2 X3 X4 average_marks
## 1 Pragya 1 4 7 10 5.5
## 2 Deepika 2 5 8 11 6.5
## 3 Chandran 3 6 9 12 7.5
2.3 Hands-on Exercises
- Basic Data Types
- Create an integer variable
age
with value25
. - Create a numeric variable
height
representing height in meters. - Define a string variable
name
with the value"Alex"
. - Create a boolean variable
is_student
indicating whether someone is a student or not. - Create a complex number variable
z
representing2 + 3i
- Define a raw data variable byte_value that stores the hexadecimal value
0x1a
.
- Operators
- Add two integers:
12 + 8
. - Divide two numbers:
45.5 / 5
. - Create a logical comparison to check if
age
is greater than20
. - Create a logical comparison to check if
height
is equal to1.75
.
- Vectors
- Create a numeric vector numbers with values
2, 4, 6, 8, 10
. - Create a character vector
colors
containing"red", "blue", "green", "yellow", "purple"
. - Append the value
12
to the vector numbers.
- Matrix
- Create a 3x3 matrix
A
with values from1
to9
. - Create another 3x3 matrix
B
with values from9
to1
.
- Dataframes
- Create a dataframe
students_df
with the columnsName
,Age
, andGrade
for three students. - Add a new column
Gender
to the data framestudents_df
.
- Vector and Matrix Operations
- Add the vectors
c(2, 4, 6)
andc(1, 3, 5)
. - Multiply the matrices `
A
andB
from question 3.