Preferred Method of Exploring Large Datasets
by countdown1945 - Tuesday April 30, 2024 at 03:27 AM
#1
What is everyone's preferred method of exploring large datasets and dumps? So far I've had luck using linux terminal (grepping, awk'ing, etc.,).

Loading really huge dumps will inevitably make my shit freeze.

What's your preferred method of exploring datasets?
Reply
#2
For medium sized data a simple Python Pandas solution (or even excel) is enough but for large data you can’t process it in a single computer, a cluster is needed.
Reply
#3
I like, first, analyze what type of data it contains and, once you figured out, put in a local database. For me is simple and easy dump all the data in a database/table (with the proper format) and do selects or whatever you need to read the data. Although depends on the size of the dump, usually I could use grep/awk/cli commands to extract the data, but if the dump is large, the best option is throwing to a database
Reply
#4
This thread is really interesting and useful to me because I'm just trying to figure things out. So far my solution was to buy more RAM and open a large (15gb?) .sql in notepad++. It worked but I'm sure there are better methods.

I've been looking at postgres but I don't understand the process of importing a .sql file into it and then running queries. I learn best by hands on but it's been rough having no real direction.
Reply
#5
(Apr 30, 2024, 03:37 PM)user512 Wrote: This thread is really interesting and useful to me because I'm just trying to figure things out. So far my solution was to buy more RAM and open a large (15gb?) .sql in notepad++. It worked but I'm sure there are better methods.

I've been looking at postgres but I don't understand the process of importing a .sql file into it and then running queries. I learn best by hands on but it's been rough having no real direction.

So as far as I'm aware, the process to importing an sql dump in postgresql is fairly straightforward:

1. Create a database 
2. Sign in to database
3. Use: \i /path/to/your/dump_file.sql;
4. Use queries to find relevant data
Reply
#6
(May 01, 2024, 12:21 AM)countdown1945 Wrote:
(Apr 30, 2024, 03:37 PM)user512 Wrote: This thread is really interesting and useful to me because I'm just trying to figure things out. So far my solution was to buy more RAM and open a large (15gb?) .sql in notepad++. It worked but I'm sure there are better methods.

I've been looking at postgres but I don't understand the process of importing a .sql file into it and then running queries. I learn best by hands on but it's been rough having no real direction.

So as far as I'm aware, the process to importing an sql dump in postgresql is fairly straightforward:

1. Create a database 
2. Sign in to database
3. Use: \i /path/to/your/dump_file.sql;
4. Use queries to find relevant data

Amazing, thank you. I'll start trying to play around with that. 

Generally, what's considered a large data set? I expected 15 gb to be pretty huge but was able to navigate reasonably well with notepad++
Reply
#7
emeditor to the rescue. That is what is mostly use by people here
Reply
#8
for windows i use Select-String at powershell and for linux grep and cut
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Looking for Best Current Free Cookies Stealer/grabber Misanotnessa 2 196 May 01, 2026, 10:52 PM
Last Post: Misanotnessa
  How to ear credits? dai5 2 212 Apr 25, 2026, 07:35 PM
Last Post: NOTFORSALE1932
  Proxy Provider spanko73 0 84 Feb 10, 2026, 05:18 PM
Last Post: spanko73
  SEARCHING SPANISH CALLERS troll 26 811 Feb 10, 2026, 05:16 PM
Last Post: spanko73
  SPAIN DATABASE 23M CITIZEN, IS THIS DB GENERATED? xdynamic 22 1,714 Feb 10, 2026, 07:44 AM
Last Post: llardo

Forum Jump:


 Users browsing this forum: 1 Guest(s)