Initially databases mostly processed numeric data. For this reason
databases engines were mainly designed to process numeric data and
all internal data management algorithms were designed to process
numeric data types. The well know database types based on
architecture are:
|
|
1. Hierarchical Database
|
2. Relational Database
|
3. Object-Relational Database
|
4. NoSQL Database
|
5. Graph Database
|
6. Network Database
|
|
The technology continuously evolves. As the
Internet
evolved from defence network
(ARPANET)
to a open commercial network,
the data capture process expanded from numeric
data to non-numeric, such as images, streaming video,
email output,
XML,
social media data and
so on. Initially financial institutes/banks used to store
check images in relational database as
BLOB
and character data
such as XML in
CLOB
format. Storing of image and video data as BLOB in
RDBMS
had performance and scalability issues.
|
|
One of the medical research institutes experimented storing
small size image data in a relational database and retriving it
on to their landing page, based on user login for improving
customer experience. This design quickly became a bottleneck
such as hung webpage, slow loading of webpage content
etc. and resulted in severe performance issues. It was switched
to a
LAN folder
storage (/images) and that solved the problem. The image
file is just referenced by folder path in simple
HTML
image tag in
asp or
jsp pages.
|
|
Similar LAN storage methods was used for videos in various
companies. Later on public/non-proprietary (sales/marketing)
videos were uploaded to Youtube and video links were embededd
in the company website. To store and retrive data at a rapid
rate, several algorithms were created.
|
|
The Big Data, technology is capable of storing
all types of data and handle data in the range of
PB,
XB,
and higher. The data is stored with high redundancy to avoid
single point of failure and on comodity hardware. The
fundamental principal of big data is
3V, Volume, Velocity
and Variety.
|
|
Volume:
|
Big data can manage massive volumes of data in the range of
PB,
XB, and higher.
|
|
Velocity
|
Big data can manage data created at a very high rate or
velocity, such as continuous video images captured at several points, credit card
swipes by members around the world and so on.
|
|
Variety
|
The variety aspect of big data allows data to be of any type/format such as video files
MPEG,
AVI, XML, any form
of large files, and so on.
|
|
There are several tools and languages to read data from big
data storage framework so that it can be analyzed.
AI and
ML
technologies can be used in a seemless manner.
The numeric and structured data can use
ANSI
compliant
SQL.
As data is stored in native formats, sometimes data quality
can be compromised due to velocity of data capture. There are
tools that interface big data repositories and filter data
for quality. Good quality data from these tools are
used for analysis using AI and ML.
|
|
Reference |
1.
Apache Hadoop Framework
|
2.
Data Lake
|