REQUEST How to view a very huge csv data file ?
by curiousbeast - Friday August 16, 2024 at 06:24 PM
#31
I find that visualizing 10GB of data is a gui is mostly useless. I agree with masedan, and use the command line. The examples that are provided are excellent.

Another idea is to  throw it all into an ElasticSearch instance or Postgresql database and do searches on those. If you need help, PM me, I can help you get some scripts to push the data to either of those.

Another way is to split the file up based on the common search criterea. For examples, if you have data that contains a "State" field, you could write a script to reach each line and place it in a CSV named for each state. That way, when you are searching, rather than searching the entire dataset, you will search only the needed state, which should make things a lot faster.

Yet another way is to create an index for common search terms, Like if there is an email address field, you can md5 that field and store the md5 along with the offset into the larger file that contains that record. Then, store the MD5+Offset record in order in a file. That way, if you search an email address, you can check to see if it exists in the index very quickly O(log N) time and retrieve the exact offset. Then a quick fseek to that offset and read until end of line from that location and there you  go.

All to say is that there are a ton of ways to solve theissue.. but looking at this in a GUI is rarely useful.

My two cents.
Reply
#32
use grep -riE "NAME1.*NAME2" /Desktop/user/...

this is for search into high volume files, if you want to organize big files, could you convert the .csv to .db with a easy python script, and you can get all the data in columns and with a clean presentation
Reply
#33
I download a leak and many files have 00 characters... That's why many are so big.

So I tried this command in linux :

tr -d '\000' < filename > new_filename

And I have reduced the size of many files.
Reply
#34
emEditor software is more practical I hope it can help you

This forum account is currently banned. Ban Length: Permanent (N/A Remaining)
Ban Reason: Contact Administration.
Reply
#35
You should split the file in 10GB files

This forum account is currently banned. Ban Length: Permanent (N/A Remaining)
Ban Reason: Contact Administration.
Reply
#36
(Mar 26, 2025, 07:23 PM)robertcollins445 Wrote: I find that visualizing 10GB of data is a gui is mostly useless. I agree with masedan, and use the command line. The examples that are provided are excellent.

Another idea is to  throw it all into an ElasticSearch instance or Postgresql database and do searches on those. If you need help, PM me, I can help you get some scripts to push the data to either of those.

Another way is to split the file up based on the common search criterea. For examples, if you have data that contains a "State" field, you could write a script to reach each line and place it in a CSV named for each state. That way, when you are searching, rather than searching the entire dataset, you will search only the needed state, which should make things a lot faster.

Yet another way is to create an index for common search terms, Like if there is an email address field, you can md5 that field and store the md5 along with the offset into the larger file that contains that record. Then, store the MD5+Offset record in order in a file. That way, if you search an email address, you can check to see if it exists in the index very quickly O(log N) time and retrieve the exact offset. Then a quick fseek to that offset and read until end of line from that location and there you  go.

All to say is that there are a ton of ways to solve theissue.. but looking at this in a GUI is rarely useful.

My two cents.

Nothing more to say after this

This forum account is currently banned. Ban Length: Permanent (N/A Remaining)
Ban Reason: Contact Administration.
Reply
#37
(Aug 21, 2024, 11:19 AM)curiousbeast Wrote: for the moment I can't run Emeditor on Linux, so, I use glogg on linux, it's pretty cool ;-)

You can use it on linux , if you install wine 
And with the reinstall option , you will have a new 7 days pro version free to test , each time you reinstall it

This forum account is currently banned. Ban Length: Permanent (N/A Remaining)
Ban Reason: Contact Administration.
Reply
#38
I did a search and didn't see it:

Tablecruncher, CSV viewer/editor/splitter (Windows/macOS/Linux).

Lightweight and easy to use.
Reply
#39
(Jan 18, 2026, 08:53 AM)Str3ngv3r Wrote: I did a search and didn't see it:

Tablecruncher, CSV viewer/editor/splitter (Windows/macOS/Linux).

Lightweight and easy to use.

Very interesting tool, open source (glpv3) and multi platform, just not enough update as I see.
thanks for this share, I'll test it

This forum account is currently banned. Ban Length: Permanent (N/A Remaining)
Ban Reason: Contact Administration.
Reply
#40
Also trying to figure out!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  2025 Tea Breach - DL dbase? Whattho 0 54 2 hours ago
Last Post: Whattho
  REQUEST WattPad 2020 REQUEST kkkreoifezrg 2 152 3 hours ago
Last Post: sacage_x64
  retailsamsung.com leak boosegoose1997 0 132 Yesterday, 05:37 PM
Last Post: boosegoose1997
  REQUEST (BRAZIL)Asking for the full SERASA leak that includes phone number jesusistheking 5 970 Yesterday, 10:04 AM
Last Post: Blastoise
  i search meetic database selluk 7 1,227 Yesterday, 09:54 AM
Last Post: Blastoise

Forum Jump:


 Users browsing this forum: 1 Guest(s)