“Biryani me elaichi ??!” – This is the feeling you must have got when suddenly the course changed from Big Data to Linux commands.
Not to worry. Understanding and applying Linux commands is imperative and sometimes part of the daily routine in the day-to-day job as a Data Engineer. Shell scripting may also be required in some cases as a part of the highly exhaustive end-to-end data pipelines. In this post, I will start with a basic list of Linux commands for file system navigation and later progress to higher end commands commonly used in Data Engineering.
Understanding Linux File System
Linux follows a tree structure, where every directory(folder) is under the root directory(/). Each user in Linux gets their own home directory, in the format /home/<username>. Ex – /home/hdpuser.
Apart from these, Linux has two special directories : .(denoting current directory) and ..(denoting parent directory)
Types of paths in Linux
1. Absolute path – It starts from the root directory. Ex – /root/hdpuser/file1.txt
2. Relative path – It takes reference relative to either current or parent directory. Ex – ./file2.txt (in the same directory) and ../file3.txt (in parent directory)
Linux Commands
- ls – List the contents of any directory. Ex – ls /home/hdpuser lists all files and directory in /home/hdpuser. ls lists all files and directory in the current directory.
ls -l = detailed list with file/directory permissions, timestamp etc
ls -a = list all hidden files
ls -t = sort by latest created
ls -r = reverse file/directory list
ls -R = recursively see the list of all directories and subdirectories inside current directory.
- touch – create an empty file/change timestamp of existing file
- mkdir – create new empty directory
- rmdir – delete empty directory
rmdir -R = remove non-empty directory(force remove)
- cp – copy files/directories
- mv – move files/directories (used for renaming as well)
That’s all for this post, guys. These are the commands commonly used in the world of big data. There are many more commands for checking file headers, connecting to servers, string manipulation etc. but those are for later.
Ciao !!