Big Data Analytics: Platforms and Processing Frameworks

Data Analytics and Business Intelligence October 25, 2025

Introduction

This course provides comprehensive knowledge of big data platforms and processing frameworks for handling massive datasets. Participants will learn to work with distributed computing systems and process large-scale data using modern big data technologies. The curriculum covers Hadoop ecosystem components, Spark processing, NoSQL databases, and cloud-based big data solutions. Through hands-on labs and projects, learners will develop the skills to design and implement scalable data processing pipelines that can handle terabytes of data efficiently.

Objectives

Key learning objectives include:

Understand big data architecture and ecosystem components
Implement Hadoop Distributed File System (HDFS)
Process data using MapReduce and Spark frameworks
Work with NoSQL databases for big data storage
Develop scalable data processing pipelines
Optimize big data workflows for performance
Implement real-time streaming data processing
Manage big data clusters and resources

Target Audience

Data engineers and architects
Big data developers
Data scientists working with large datasets
IT professionals managing data infrastructure
Software engineers building data-intensive applications
System administrators
Cloud data engineers

Methodology

The course uses a combination of theoretical concepts and extensive hands-on labs with big data platforms. Participants work with real large datasets in cloud environments to practice distributed processing. Case studies from web analytics, IoT, and social media provide context for big data applications. Group activities focus on designing scalable architectures, while individual exercises build technical skills. Mini-case studies present specific big data challenges, and syndicate discussions explore solution patterns and best practices.

Personal Impact

Enhanced ability to work with large-scale data systems
Improved skills in distributed computing frameworks
Stronger understanding of big data architecture
Increased proficiency with cloud data platforms
Better problem-solving for scalability challenges
Developed ability to design data processing pipelines

Organizational Impact

Ability to process and analyze massive datasets
Improved scalability of data infrastructure
Reduced processing time for large-scale analytics
Enhanced capabilities for real-time data processing
Better cost management for big data workloads
Increased competitive advantage through big data insights

Course Outline

Unit 1: Big Data Fundamentals

Core Concepts

Characteristics of big data (Volume, Velocity, Variety)
Big data architecture patterns
Distributed computing principles
Big data use cases and business value

Unit 2: Hadoop Ecosystem

Hadoop Core Components

HDFS architecture and operations
MapReduce programming model
YARN resource management
Hadoop cluster administration

Hadoop Tools

Hive for SQL-like querying
Pig for data flow processing
HBase for NoSQL storage
Sqoop for data transfer

Unit 3: Spark Processing Framework

Spark Fundamentals

Spark architecture and RDDs
DataFrame and Dataset APIs
Spark SQL for structured processing
Spark cluster management

Advanced Spark

Spark Streaming for real-time data
Machine Learning with MLlib
Graph processing with GraphX
Performance optimization techniques

Unit 4: NoSQL Databases

NoSQL Categories

Document databases (MongoDB)
Column-family stores (Cassandra)
Key-value stores (Redis)
Graph databases (Neo4j)

Unit 5: Streaming Data Processing

Real-time Analytics

Stream processing concepts
Apache Kafka for message queuing
Apache Flink for stream processing
Storm for real-time computation

Unit 6: Cloud Big Data Platforms

Cloud Solutions

AWS EMR and Athena
Azure HDInsight and Databricks
Google BigQuery and Dataflow
Cloud data lake architectures

Ready to Learn More?

Have questions about this course? Get in touch with our training consultants.

Submit Your Enquiry

09 Feb

Bangkok

February 09, 2026 - February 11, 2026