Tuesday, September 24, 2019

what is Mapreduce


Mapreduce is a processing engine in Hadoop. It can process only batch data. It means bounded data.
Internally it process disk to disk. So It's very very slow.
Manually optimize everything, allows different ecosystems like HIve, Pig, and more to process the data.

what is YARN


YARN is a distributed OS also called Cluster manager to process huge amount of data paralelly and quickly.
At a time process different types of data such as Batch process, streaming, iterative data and more.
It's unified stack.

what is HDFS


HDFS is a file system to store the data in reliable manner. It consists of two types of nodes called NameNode and DataNode to store metadata and actual data.

HDFS is a block-structured file system. Just like Linux file systems, HDFS splits a file into fixed-size blocks, also known as partitions or splits. The default block size is 128 MB.

what is hadoop



Hadoop is one of the first popular open source big data technologies. It is a scalable
fault-tolerant system for processing large datasets across a cluster of commodity hardware.

Internal components:
HDFS & YARN with Mapreduce

Software Testing Strategies

Testing Strategy :
-A strategic approach to testing 
- Test strategies for conventional software 
- Test strategies for object-oriented software
- Validation testing 
- System testing 
- The art of debugging

Introduction:
• A strategy for software testing integrates the design of software test cases into a well-planned series of steps that result in successful development of the software 
• The strategy provides a roadmap that describes the steps to be taken, when, and how much effort, time, and resources will be required 
• The strategy incorporates test planning, test case design, test execution, and test result collection and evaluation 
• The strategy provides guidance for the practitioner and a set of milestones for the manager 
• Because of time pressures, progress must be measurable and problems must surface as early as possible

General Characteristics of Strategic Testing:
• To perform effective testing, a software team should conduct effective formal technical reviews 
• Testing begins at the component level and work outward toward the integration of the entire computer-based system 
• Different testing techniques are appropriate at different points in time 
• Testing is conducted by the developer of the software and (for large projects) by an independent test group 
• Testing and debugging are different activities, but debugging must be accommodated in any testing strategy 

Verification and Validation:
• Software testing is part of a broader group of activities called verification and validation that are involved in software quality assurance 
• Verification (Are the algorithms coded correctly?) – The set of activities that ensure that software correctly implements a specific function or algorithm 
• Validation (Does it meet user requirements?) – The set of activities that ensure that the software that has been built is traceable to customer requirements

Organizing for Software Testing:
• Testing should aim at "breaking" the software 
• Common misconceptions 
– The developer of software should do no testing at all
– The software should be given to a secret team of testers who will test it unmercifully – The testers get involved with the project only when the testing steps are about to begin 
• Reality: Independent test group 
– Removes the inherent problems associated with letting the builder test the software that has been built 
– Removes the conflict of interest that may otherwise be present 
– Works closely with the software developer during analysis and design to ensure that thorough testing occurs

A Strategy for Testing Conventional Software: