Unzipping and Merging .txt files in Python in just 3 steps: Extraction, Merging, Cleaning up

This post will guide you through the process of unzipping and merging .txt files in Python in just 3 simple steps. The process involves extracting .txt files from zip files, merging the extracted files into a single file, and cleaning up the temporary folder. We will be using the built-in python modules “zipfile” and “os” to accomplish this task. By the end of this post, you will have a solid understanding of how to extract and merge .txt files in Python and will be able to implement the code in your own projects.

Table of Contents

Step 1: Extracting .txt files from Zip files:

The first step in the process is to extract the .txt files from the zip files. The following function, extract_txt_files(), will extract all the .txt files from the given zip file to the given temp folder.

import os
import zipfile
import shutil

def extract_txt_files(zip_path, temp_folder):
    """Extracts all the .txt files from the given zip file to the given temp folder"""
    with zipfile.ZipFile(zip_path, "r") as zip_file:
        i = len([name for name in os.listdir(temp_folder) if name.endswith(".txt")]) + 1
        for member in zip_file.infolist():
            if member.filename.endswith(".txt"):
                zip_file.extract(member, temp_folder)
                os.rename(os.path.join(temp_folder, member.filename), os.path.join(temp_folder, f"{i}.txt"))
                i += 1

The function uses the zipfile module’s ZipFile class to open the zip file in read mode. It then initializes a variable `i` with the number of existing .txt files in the temporary folder + 1. This is done to ensure that the extracted files will have unique names and not be overwritten. Then, the function uses a for loop to iterate through the members of the zip file. If a member’s filename ends with .txt, it is extracted to the temp folder and then renamed with the value of `i` as the index and the .txt extension. After renaming, the value of `i` is incremented by 1. This ensures that the next extracted file will have a unique name and the sequence of files will be maintained.

Step 2: Merging .txt files:

The next step is to merge the extracted .txt files into a single file. The following function, merge_txt_files(), will merge all the .txt files from the given temp folder into a single file called “merged.txt”

def merge_txt_files(temp_folder):
    """Merges all the .txt files from the given temp folder into a single file called "merged.txt" """
    with open("merged.txt", "w") as outfile:
        for filename in os.listdir(temp_folder):
            if filename.endswith(".txt"):
                with open(os.path.join(temp_folder, filename)) as infile:
                    outfile.write(infile.read())

This function uses the built-in open() function to create a new file called “merged.txt” in write mode. Then it uses a for loop to iterate through the .txt files in the temporary folder. For each file, it opens the file, reads its contents and then writes it to the “merged.txt” file. This process is repeated for all the .txt files in the temporary folder resulting in a single file containing the contents of all the files.

Step 3: Cleaning up the Temporary Folder:

The final step is to delete the temporary folder after all the .txt files have been extracted and merged. The following function, delete_temp_folder(), will delete the given temp folder

def delete_temp_folder(temp_folder):
    """Deletes the given temp folder"""
    os.rmdir(temp_folder)

This function uses the os.rmdir() method to delete the temporary folder. This method will only work if the folder is empty.To delete a folder along with its contents, you should use the shutil.rmtree(temp_folder) method instead.

def delete_temp_folder(temp_folder):
    """Deletes the given temp folder and its contents"""
    shutil.rmtree(temp_folder)

This method will recursively delete the directory and all its contents, including any files and subdirectories.

In order to use these functions, you can set the path of your zip files and temporary folder, and then call the functions as follows:

# paths to the zip files
zip1_path = "zip1.zip"
zip2_path = "zip2.zip"

# create a temporary folder to extract the .txt files
temp_folder = "temp"
os.makedirs(temp_folder, exist_ok=True)

# extract the .txt files from the zip files
extract_txt_files(zip1_path, temp_folder)
extract_txt_files(zip2_path, temp_folder)

# merge the .txt files
merge_txt_files(temp_folder)

# delete the temporary folder
delete_temp_folder(temp_folder)

Unzipping and Merging .txt files in Python: Final code

import os
import zipfile
import shutil

def extract_txt_files(zip_path, temp_folder):
    """Extracts all the .txt files from the given zip file to the given temp folder"""
    with zipfile.ZipFile(zip_path, "r") as zip_file:
        i = len([name for name in os.listdir(temp_folder) if name.endswith(".txt")]) + 1
        for member in zip_file.infolist():
            if member.filename.endswith(".txt"):
                zip_file.extract(member, temp_folder)
                os.rename(os.path.join(temp_folder, member.filename), os.path.join(temp_folder, f"{i}.txt"))
                i += 1

def merge_txt_files(temp_folder):
    """Merges all the .txt files from the given temp folder into a single file called "merged.txt" """
    with open("merged.txt", "w") as outfile:
        for filename in os.listdir(temp_folder):
            if filename.endswith(".txt"):
                with open(os.path.join(temp_folder, filename)) as infile:
                    outfile.write(infile.read())

def delete_temp_folder(temp_folder):
    """Deletes the given temp folder"""     
    shutil.rmtree(temp_folder)

# paths to the zip files
zip1_path = "zip1.zip"
zip2_path = "zip2.zip"

# create a temporary folder to extract the .txt files
temp_folder = "temp"
os.makedirs(temp_folder, exist_ok=True)

# extract the .txt files from the zip files
extract_txt_files(zip1_path, temp_folder)
extract_txt_files(zip2_path, temp_folder)

# merge the .txt files
merge_txt_files(temp_folder)

# delete the temporary folder
delete_temp_folder(temp_folder)


Conclusion

Overall, the script takes 3 steps:

  1. Unzipping and extracting all the .txt files from the given zip files to a temporary folder
  2. Merging the extracted .txt files into a single file called “merged.txt”
  3. Deleting the temporary folder and its contents

By following these simple steps, you can easily extract and merge .txt files from multiple zip files in Python. The provided code snippets can be easily customized to fit your specific use case. It’s a good practice to always keep the backup of the original files before performing any operations.

Leave a Comment