How to split large .CSV file in batches

Vincent Jiang
3 min readDec 20, 2020

If you are dealing with a lot of large .CSV files, and you want to split them into small files, there’re typically two things that you can do,

  1. Using Python or any server-side programming language (e.g. Terminal, Windows cmd, Java) to split the files
  2. Build a database and insert the .CSV files into it before

The problem with solution one is that running processes on GBs of data will put a lot of stress to your local computer. You also need to be very technical in order to handle different edge cases with your file.

The problem with solution two is that you’ll have to design a table schema and find a server to host the database. Also you need to write server side code to maintain or change the database.

There’s however another way to do this; upload your file to cloud and have the cloud database handling your files. A number of advantages with this solution are 1. This way you won’t lose your original file saved on local. 2. No need to build a database on your own since it’s already built on cloud. 3. It will be much faster processing your files on cloud than locally.

  1. First, upload your .CSV file to a cloud database

By uploading your .CSV file to a cloud database, you’ve essentially turn your local .CSV file file into a database.

Turn your csv file into a database
Image source: Acho Studio

2. The built-in parser will handle your data

Most .CSV files are not perfect for databases . There could be 1. Missing values for valid columns 2. Invalid column names (special characters) 3. Too many columns (most databases support up to 10,000 columns 4. Corrupt values 5. Mismatched value type between the column and values. Most of theses problems are handled in Acho Studio. You can use built-in parser to make sure that your .CSV files are correctly inserted in a cloud database.

Parse any csv file on cloud
Image source: Acho Studio

3. Load your table

Once your table is successfully loaded, you can start transforming the table to the format you want. Since this table is on a cloud RDBMS, you will not lose the original file every time that you transform it. Thing you can do include “Filter”, “ Join/Merge”, “Find & Replace”, “Formula”, “SQL” and etc. These unique actions will help you prepare the final table before splitting it.

split csv into multiple files
Image source: Acho Studio

4. Download your data in batches

The last step of course is to download your finalized table in batches.

a. Click on the little cloud button on the top right.

b. Check the “Batch download” option.

c. Then you can select the rows and batches.

d. The downloaded file will be a .zip file that contain all your .CSV files.

split csv into multiple files
Image source: Acho Studio

--

--