Day 15: What no one tells you about data manipulation
I remember the first time attending a coding class back in university
It was 4 P.M. We were tired from regular classes and the last thing we wanted to do was learn something new.
But as they say, no pain no gain *shrugs*
The first thing I hear?
Let's look at this Syntax
Syn-- what?
Now, keep in mind, I've never coded in my life.
Even the word 'syntax' is new to me
(tip to people who teach code - make things simple to your audience)
anyway
The teacher proceeds to write some code (while I'm trying to figure out what the technical jargons mean)
No prizes for guessing what I learned that day
Nothing. Zero. Zilch.
But why is this relevant to data manipulation?
Let me explain
You see... R is not complex when you realise what's happening (at least not yet)
However... when you bring together multiple functions, it's also important to understand where they all fit in
In data manipulation, we've seen a few functions from the dplyr package
filter, arrange, distinct
All these functions have one thing in common -
The main focus is on the rows of a dataset.
The functions in dplyr are organised into four groups:
rows
columns
groups
joins (for tables)
When working on specific aspects of the dataset, it gets easier to understand which functions to use.
Data manipulation, or, even data analysis gets easier when you think in terms of structure.
What are you trying to do?
Change rows? great. Focus on that first. Columns next? Proceed with that.
The point is -
Coding is easy when you break them down and understand where each function fits in.
I don't want you to feel lost like I did in that class.
If something doesn't make sense, please reach out :)
Some of you already are and that's great!
Happy coding