In R Programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained :
Any dataframe column in the R programming language can be referenced either through its name df$col-name or using its index position in the dataframe df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then a dataframe subset can be obtained. These conditions are applied to the row index of the dataframe so that the satisfied rows are returned.
Cells in dataframe can contain missing values or NA as its elements, and they can be verified using is.na() method in R language.
Example:
Output
[1] “Original dataframe”
col1 col2 col3
1 0 TRUE
2 b 2 FALSE
3 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] “Modified dataframe”
col1 col2 col3
2 b 2 FALSE
4 e 4 TRUE
5 e 5 TRUE
Column values can be subjected to constraints to filter and subset the data. The values can be mapped to specific occurrences or within a range.
Example:
Output
[1] “Original dataframe”
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 e 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] “Modified dataframe”
col1 col2 col3
1 b 0 TRUE
4 e 4 TRUE
5 e 5 TRUE
Column values can be subjected to constraints to filter and subset the data. The conditions can be combined by logical & or | operators. The %in% operator is used here, in order to check values that match to any of the values within a specified vector.
Example:
Output
[1] “Original dataframe”
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] “Modified dataframe”
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
4 e 4 TRUE
5 e 5 TRUE
The dplyr library can be installed and loaded into the working space which is used to perform data manipulation.
The filter() function is used to produce a subset of the dataframe, retaining all rows that satisfy the specified conditions. The filter() method in R can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset dataframe has to be retained in a separate variable.
Syntax:
filter(df , cond)
Parameter :
df – The dataframe object
cond – The condition to filter the data upon
Example: