When we think about big data the thought which comes immediately to mind is huge amount of data.This description of big data might be true but big data is recognized by other characteristics as well apart from just huge amount of data.
To understand what is big data it is helpful to first think about the need of big data.After all database management systems and data warehouses can handle huge amounts of data.
Data is getting generated at an ever increasing rate.90% of data today is generated over the last two years.This can give us some idea about the rate at which the data is being generated.And its not just the amount of data which is increasing.The types of data which is getting generated is also
Some of the different types of data being generated today are:
- Social media data from social networking sites such as twitter,google
- Databases From systems such as enterprise applications
- XML and JSON From web services and applications
- Raw data From systems such as mobile sensors
Images,Video and Audio From systems such as enterprise applications and social networking sites
So we can understand that two of the main characteristics of the data is getting generated today and the data that will be generated in future will have these two characteristics:
- Huge volume of data
- Different types of data
- Apart from these two there is one more attribute that is common of this data being generated ,which is velocity.Velocity means the speed at which the data is getting generated.
So the characteristics which describes big data are
- Volume Huge amount of data ,that is data in terabytes or petabytes
- Variety Different types of data ,such as structured,semi structured and unstructured ,e.g ,relational data,video,audio,emails.
- Velocity High rate at which data is generated and processed.
The above are also called the three V’s of big data.These define a data source as big data.
Is simple words we can say that big data are enormously large data sets that traditional data processing applications are not capable of processing.
The main challenges offered by such data sets are
Analyzing big data can provide benefits such as predicting the future outcome by revealing the patterns in the existing data.Some areas are:
- Predicting the challenges which an organization can face
- Diagnosing the diseases which could occur in a particular geography
- Predicting the buying behaviors of customers based on their past history.
Big data analytics provide better,timely and accurate decision making for an organization.
There are few different technologies and techniques which allows to handle the big data.There are technologies from vendors such as Oracle,Microsoft.But the most widely used used is Apache Hadoop.It is an open source project developed by Doug Cutting.