How do you analyze data directly from MySQL, PostgreSQL, and MongoDB

If you’re using MySQL workbench, PgAdmin or some other visual database design tools, you probably wonder why it is so hard to use them for advanced data analytics.

First, let us define what analytics are. Data analysis is essentially…

Traditionally, companies use ETL pipeline to connect production systems with data warehouses. This however has changed a lot in recent years since businesses require more flexibility and cost saving.

From the technical perspective, there are two main differences between ETL and ELT,

  • ETL pipelines transform data on a third-party server…

What is data cleaning/cleansing?

Data Cleansing also known as Data Cleaning is a process of rectifying off corrupt records from a table, dataset or database. Data Cleansing alludes to distinguishing erroneous, deficient, insignificant, mistaken or risky portions of data and then replacing, modifying, or deleting that dirty data.

In general…

Bigger .CSV files are not very accessible in most desktop programs. In fact, there is a hard limit of 1,048,576 rows and 16,384 columns in an Excel spreadsheet. If your table is larger, you may need to parse and process it via Python, SAS, or R. …

Maybe you’ve experienced this in the past, Excel is sometimes not the most stable application on your computer. Though the reasons behind Excel crashing is complicated; it’s a mix of your usage, computational power, and data size/type.

To understand why Excel crashes, you’ll first need to understand what it does…

