Day 15: What no one tells you about data manipulation

I remember the first time attending a coding class back in university

It was 4 P.M. We were tired from regular classes and the last thing we wanted to do was learn something new.

But as they say, no pain no gain *shrugs*

The first thing I hear?

Let's look at this Syntax


Syn-- what?

Now, keep in mind, I've never coded in my life.

Even the word 'syntax' is new to me

(tip to people who teach code - make things simple to your audience)

anyway

The teacher proceeds to write some code (while I'm trying to figure out what the technical jargons mean)

No prizes for guessing what I learned that day

Nothing. Zero. Zilch.

But why is this relevant to data manipulation?

Let me explain

You see... R is not complex when you realise what's happening (at least not yet)

However... when you bring together multiple functions, it's also important to understand where they all fit in

In data manipulation, we've seen a few functions from the dplyr package

filter, arrange, distinct

All these functions have one thing in common - 

The main focus is on the rows of a dataset.

The functions in dplyr are organised into four groups:

  1. rows

  2. columns

  3. groups

  4. joins (for tables)

When working on specific aspects of the dataset, it gets easier to understand which functions to use.

Data manipulation, or, even data analysis gets easier when you think in terms of structure.

What are you trying to do?

Change rows? great. Focus on that first. Columns next? Proceed with that.

The point is - 

Coding is easy when you break them down and understand where each function fits in.

I don't want you to feel lost like I did in that class.

If something doesn't make sense, please reach out :)

Some of you already are and that's great!

Happy coding 

Next
Next

Day 14: Count how many 'taubas' are in Tauba Tauba 😭