In Linux, you often need to identify and manage large files and directories to optimize storage and system performance. In this article, we will provide various methods to find most used disk space directories and files in Linux in a structured manner.

Using Built in Utilities to Check Disk Usage

In Linux Bash, you have several built-in utilities and commands that you can use to check and monitor disk usage. These utilities are readily available, and don’t require any additional installations.

Here are some of the commonly used built-in utilities for checking disk usage in Linux:

1. Identifying Large Files

To find the top 10 largest files from the current directory:

du . | sort -nr | head -n10
Disk space in linux

This command utilizes the du command to display disk usage, sort to sort the results in descending order, and head to display the top 10 largest files.

2. Identifying Large Directories

To list the largest directories in the current directory:

du -s * | sort -nr | head -n10
bask disk usage

This command summarizes directory sizes, sorts them, and shows the top 10 largest directories in the current working directory.

3. Displaying Large Files with GNU Tools

You can also use clever bash scripting with the find command to list the biggest files and directories.

Method 1

The following command can be used to find and list files larger than 20,000 kilobytes (20 megabytes) in the user’s home directory (~).  It then sorts these files by size in reverse order and displays the file names along with their respective sizes:

find ~ -type f -size +20000k -exec ls -lh {} \; 2> /dev/null   | awk '{ print $NF ": " $5 }'  | sort -hrk 2,2

Here’s a step-by-step explanation of how this command works:

  1. find ~ -type f -size +20000k: This part of the command initiates the search for files in the home directory (~). If you want to use a different directory, replace ~ with the path of your directory. The -type f option specifies that we are looking for regular files, and the -size +20000k option filters files larger than 20,000 kilobytes.
  2. -exec ls -lh {} \;: For each file that matches the criteria, the find command executes the ls -lh command, which lists detailed information about the file. The {} placeholder is replaced with the current file’s name during execution.
  3. 2> /dev/null: This part redirects any error messages (specifically, standard error) to /dev/null, effectively suppressing them. This is done to prevent error messages from being displayed in the terminal.
  4. | awk '{ print $NF ": " $5 }': The awk command processes the output from the previous command. It extracts the file name (represented as $NF) and the file size (represented as $5) and prints them in the format “filename: size”.
  5. | sort -hrk 2,2: The final part of the command uses sort to arrange the file entries. The options used are as follows:
    1. -h: This tells sort to perform a human-readable sort, which is useful for sizes with suffixes like “K” or “M”.
    2. -r: This specifies a reverse order sort, listing the largest files first.
    3. -k 2,2: This instructs sort to use the second field (the size) as the key for sorting.

Method 2

Alternatively, you can use the following command to find and display information about the top 20 largest regular files in the user’s home directory (~). It reports the size of each file and its path, sorted in reverse order by size.

find ~ -type f -printf '%b %p\0' | sort -rzn | head -zn 20 | tr '\0' '\n'

Here’s a step-by-step explanation of how this command works:

  1. find ~ -type f -printf '%b %p\0': This part of the command initiates a search for regular files (-type f) in the home directory (~). For each file found, it uses the -printf option to format the output as follows:
    1. %b: Represents the number of 512-byte blocks allocated for the file.
    2. %p: Represents the file’s path.
    3. \0: Terminates each output entry with a null character (\0). The null character is used to separate file entries, and is especially important when dealing with file paths that contain spaces or special characters.
  2. | sort -rzn: The output from the find command is then piped (|) to the sort command for sorting. The following options are used with sort:
    1. -r: Performs a reverse order sort, arranging the entries from largest to smallest.
    2. -z: Informs sort that the input entries are null-terminated, which is why the null character (\0) was added at the end of each entry by the find command.
    3. -n: Specifies a numerical sort, ensuring that file sizes are sorted based on their numeric values rather than lexicographically.
  3. | head -zn 20: After the sorted list is generated by sort, the output is piped to the head command.
    1. -z: Informs head that the input entries are null-terminated.
    2. -n 20: Limits the output to the first 20 entries, which are the 20 largest files based on size.
  4. | tr '\0' '\n': Finally, the tr (translate) command is used to replace the null characters (\0) in the output with newline characters (\n). This is done to format the output in a more human-readable way, with each file and its size on a separate line.

Note: You can also use the above bash script in conjunction with other bash commands such as numfmt to convert file sizes into human-readable formats. For example, you can do something like this:

find ~ -type f -printf '%b %p\0' | sort -rzn | head -zn 20 | numfmt -z --from-unit=512 --to=iec | tr '\0' '\n'

4. Assessing Directory Usage

To check how much storage memory each directory occupies, use the following commands:

cd /
du -sh * | grep G

The above command shows memory usage by directories that occupy a volume measured in Gigabytes. It helps you decide which directories or files you may want to consider for deletion or optimization.

Using Third Party Tools to Assess Disk Usage 

In addition to built-in utilities, there are many third party tools that you can use to assess disk usage in Linux Bash.

iotop

This is a valuable tool for system administrators, DevOps professionals, and anyone responsible for system performance management. It helps diagnose performance bottlenecks related to disk I/O, to identify resource-hungry processes, and take appropriate actions to optimize system performance.

Installation: Use the apt package manager to install iotop-c:

sudo apt update
sudo apt install iotop-c

Usage: After installation, you can run iotop to monitor and assess disk I/O usage. The -o option sorts the output by disk read or write, and -P displays paths. The -a option shows accumulated values, which can be useful for identifying processes with high disk I/O over time.

Here’s an example of how to run iotop:    

iotop -oPa

gdu (Go Disk Usage)

This is a fast and efficient disk usage analyzer written in Go, suitable for both SSD and HDD disks. The tool allows users to analyze disk usage on their Linux system.

Installation: You can download the binary from its GitHub repository and follow the installation steps in the official repository. For Debian/Ubuntu systems, you can run the following command to install the gdu utility:

apt install gdu

Usage: To analyze disk usage, run gdu with the desired directory as an argument. Use arrow keys on your keyboard to navigate different folders.

Dua (Disk Usage Analyzer)

This is designed to help users conveniently assess and manage disk space usage in a given directory on a Linux system. It’s optimized for performance, using parallel processing to quickly provide relevant information about disk space usage.

Additionally, Dua offers a safe and efficient way to delete unnecessary data from your storage.

Dua features an interactive mode in which you can explore your file system, and choose to delete files and directories to free up disk space. It is designed to minimize the risk of accidental deletions by using a multi-stage process, making it safe for exploration.

Installation: Dua can be installed by using the binary release for your specific system from the Dua repository.

Usage: 

  • Run dua to count the space used in the current working directory.
  • Execute dua * to count the space used in all directories that are not hidden.
  • To learn about additional functionality such as the aggregate feature, use the dua aggregate --help command.
  • To launch into interactive mode, run dua i

godu (Go Disk Usage)

Godu’s primary purpose is to help users identify space-consuming files and directories on their storage drives. It is intended for users who prefer a command-line tool that is fast and efficient.

It is implemented in the Go programming language, known for its efficiency and performance. This choice of language ensures that the tool can take full advantage of the system’s resources, and provide fast results.

Installation: godu is another Go-based disk usage tool. You can download the binary from its GitHub repository, make it executable, and move it to a directory in your PATH.

Usage: Run godu to analyze disk usage for a specific directory. Use arrow keys on the keyboard to navigate through the file system.

ncdu (NCurses Disk Usage)

Ncdu is a terminal-based disk usage analyzer for Linux and other POSIX-like systems. It introduces a feature to display shared data between directories, which is a unique feature among disk usage analyzers.

Installation: ncdu is an interactive disk usage analyzer available through most package managers. You can install it using apt on Debian-based systems:

sudo apt update
sudo apt install ncdu

Usage: Run ncdu to analyze disk usage in an interactive text-based interface:

These tools provide various ways to assess disk usage, and you can choose the one that best suits your needs and preferences. Some offer graphical interfaces, while others provide command-line-based analyses.

Final Thoughts

In this article, we have explored how to analyze and manage disk usage in your Linux server using simple bash commands, as well as other third party tools. If you enjoyed reading this article, you should also check out our Introduction to Bash For Loops: A Beginner’s Guide as well as our deep dive in Linux Pipes and Xargs.

If you’re struggling with managing your Linux server or need a user-friendly interface to streamline server management, you might want to consider a solution like RunCloud.

RunCloud offers a helpful GUI (Graphical User Interface) that simplifies server administration and updates, making it easier for users to handle their Linux servers with ease. Start using RunCloud today!