Big Data

Initially databases mostly processed numeric data. For this reason databases engines were mainly designed to process numeric data and all internal data management algorithms were designed to process numeric data types. The well know database types based on architecture are:

1. Hierarchical Database

2. Relational Database

3. Object-Relational Database

4. NoSQL Database

5. Graph Database

6. Network Database

The technology continuously evolves. As the Internet evolved from defence network (ARPANET) to a open commercial network, the data capture process expanded from numeric data to non-numeric, such as images, streaming video, email output, XML, social media data and so on. Initially financial institutes/banks used to store check images in relational database as BLOB and character data such as XML in CLOB format. Storing of image and video data as BLOB in RDBMS had performance and scalability issues.

One of the medical research institutes experimented storing small size image data in a relational database and retriving it on to their landing page, based on user login for improving customer experience. This design quickly became a bottleneck such as hung webpage, slow loading of webpage content etc. and resulted in severe performance issues. It was switched to a LAN folder storage (/images) and that solved the problem. The image file is just referenced by folder path in simple HTML image tag in asp or jsp pages.

Similar LAN storage methods was used for videos in various companies. Later on public/non-proprietary (sales/marketing) videos were uploaded to Youtube and video links were embededd in the company website. To store and retrive data at a rapid rate, several algorithms were created.

The Big Data, technology is capable of storing all types of data and handle data in the range of PB, XB, and higher. The data is stored with high redundancy to avoid single point of failure and on comodity hardware. The fundamental principal of big data is 3V, Volume, Velocity and Variety.

Volume:

Big data can manage massive volumes of data in the range of PB, XB, and higher.

Velocity

Big data can manage data created at a very high rate or velocity, such as continuous video images captured at several points, credit card swipes by members around the world and so on.

Variety

The variety aspect of big data allows data to be of any type/format such as video files MPEG, AVI, XML, any form of large files, and so on.

There are several tools and languages to read data from big data storage framework so that it can be analyzed. AI and ML technologies can be used in a seemless manner. The numeric and structured data can use ANSI compliant SQL. As data is stored in native formats, sometimes data quality can be compromised due to velocity of data capture. There are tools that interface big data repositories and filter data for quality. Good quality data from these tools are used for analysis using AI and ML.

Reference

1. Apache Hadoop Framework

2. Data Lake