MOBI BOOT CAMP CORP. logoLearning Buddy
  • SIGN IN
  • Introduction to Data Analytics
  • Data in Different Forms
  • Data Collection
  • Sampling
  • What is EDA?
  • Why Diagrams?
  • Types of Data
  • Data Cleaning
  • Central Tendencies
  • Summary Statistics
  • Skewness
  • Correlations
  • Glossary
  • Slides

Data in Different Forms

Data comes in many shapes and sizes. As a data analyst, you will encounter data in various formats. Understanding these formats is the first step in being able to process and analyze the information within.

It's helpful to think of data organization as a spectrum. On one end is highly structured data (like a formal database table), and on the other is completely unstructured data (like a plain text document). Semi-structured data falls in between. For practical purposes, we can group data into these three main forms.


1. Structured Data

This is data that's stored, processed, and manipulated in a traditional, organized format, typically a relational database or a spreadsheet. It adheres to a predefined data model.

  • Characteristics: Highly organized, easy to query and analyze.
  • Examples: Transactional data from an e-commerce website, customer records in a CRM.
  • Common Formats: CSV, Excel Spreadsheets, SQL Databases.

CSV (Comma-Separated Values)

While technically a semi-structured format (as it doesn't enforce data types like a database), CSV is the universal standard for representing tabular, structured data in a flat file. It's organized into rows and columns.

  • The first line is often the header row.
  • Each new line corresponds to a new row of data.
  • Can be opened by any spreadsheet program or text editor.
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S

2. Unstructured Data

This is data generated from human activities that doesn't fit into a structured database format. It has no predefined data model.

  • Characteristics: Qualitative, lacks a clear structure, requires more advanced techniques (like NLP or computer vision) to analyze.
  • Examples: Facebook posts, emails, tweets, images, videos, audio files.
  • Common Formats: .txt, .doc, .jpg, .png, .mp3, .mp4.

3. Semi-Structured Data

This data doesn't fit neatly into the rigid rows and columns of a traditional database but contains tags or other markers to enforce hierarchies and relationships between data elements.

  • Characteristics: More flexible than structured data, has a self-describing structure.

  • Examples: HTML pages, JSON files, XML documents.

  • Common Formats:

    • JSON (JavaScript Object Notation): A lightweight format using human-readable key-value pairs. Very common for web APIs.

      {
        "PassengerId": 1,
        "Survived": 0,
        "Pclass": 3,
        "Name": "Braund, Mr. Owen Harris",
        "Sex": "male",
        "Age": 22
      }
      
    • XML (eXtensible Markup Language): A markup language that uses tags to define elements. More verbose than JSON but still widely used.

      <Passenger>
        <PassengerId>1</PassengerId>
        <Survived>0</Survived>
        <Pclass>3</Pclass>
        <Name>Braund, Mr. Owen Harris</Name>
        <Sex>male</Sex>
        <Age>22</Age>
      </Passenger>
      
    • HTML (HyperText Markup Language): The standard markup language for creating web pages. It's a form of semi-structured data because the tags define a structure.

      <!DOCTYPE html>
      <html>
      <head>
        <title>Passenger List</title>
      </head>
      <body>
      
        <h1>Passenger Information</h1>
        <p><b>Name:</b> Braund, Mr. Owen Harris</p>
        <p><b>Age:</b> 22</p>
      
      </body>
      </html>
      
Privacy Policy | Terms & Conditions