Big Data

Data Analytics And Big Data

Big Data

Big Data

Big data is a combination of structured, semi-structured and unstructured data collected by organizations that can be mined for information and used in machine learning projects, predictive modeling and other advanced analytics applications.

Systems that process and store big data have become a common component of data management architectures in organizations, combined with tools that support big data analytics uses. Big data is often characterized by the three V's:

  • The large volume of data in many environments;
  • The wide variety of data types frequently stored in big data systems; and
  • The velocity, at which much of the data is generated, collected and processed.

Although big data doesn't equate to any specific volume of data, big data deployments often involve terabytes, pet bytes and even Exabyte’s of data created and collected over time.

Companies use big data in their systems to improve operations, provide better customer service, create personalized marketing campaigns and take other actions that, ultimately, can increase revenue and profits. Businesses that use it effectively hold a potential competitive advantage over those that don't because they're able to make faster and more informed business decisions.


  1. Better customer insight
  2. Improved operations
  3. More insight market Intelligence
  4. Agile supply chain management
  5. Data driven innovation 
  6. Smarter recommendations and targeting 

Breaking down the V's of Big Data:
Volume is the most commonly cited characteristic of big data. A big data environment doesn't have to contain a large amount of data, but most do because of the nature of the data being collected and stored in them. Click streams, system logs and stream processing systems are among the sources that typically produce massive volumes of data on an ongoing basis.
Big data also encompasses a wide variety of data types, including the following:

  • Structured data, such as transactions and financial records;
  • Unstructured data, such as text, documents and multimedia files; and
  • Semi-structured data, such as web server logs and streaming data from sensors.

Various data types may need to be stored and managed together in big data systems. In addition, big data applications often include multiple data sets that may not be integrated upfront. For example, a big data analytics project may attempt to forecast sales of a product by correlating data on past sales, returns, online reviews and customer service calls.
Velocity refers to the speed at which data is generated and must be processed and analyzed. In many cases, sets of big data are updated on a real- or near-real-time basis, instead of the daily, weekly or monthly updates made in many traditional data warehouses. Managing data velocity is also important as big data analysis further expands into machine learning and artificial intelligence (AI), where analytical processes automatically find patterns in data and use them to generate insights.


  • Apache Hadoop ecosystem (Hadoop Common)
  • Hadoop Distributed File System
  • Hadoop YARN
  • Hadoop MapReduce
  • Hadoop Ozone
  • Apache Spark
  • Apache Hive

Given our experience in related works, we are confident that we would be able to provide beneficial services to the organizations.
Sapphire currently is engaged with LEA for big data deployments on policing and safe city projects.