Introduction to Data Science
Data science exists back to Ancient Egyptians Era or maybe even before that…
Isn’t it interesting?
Ancient Egyptians used Data Science to increase efficiency in the collection of taxes and for predicting the flooding in the Nile River every year. That’s where Data Science started flourishing and expanding to this contemporary model.
Do you know?
- How do companies get to know your secrets, like your preferences for a particular brand over another one? How do amazon, Daraz and e-bay show you products when you search keywords?
- How do people estimate or predict their assumptions about any current affair?
- Or simply, how do companies get to know that what ratio of people uses a specific mobile network?
In the contemporary world, we cannot prove our point without factual data. And that happens with the work of Statisticians, Data Scientists and Economy Analysts together.
A massive amount of data is collected through Social networking like Facebook, Instagram, Google, Amazon, e-bay. For instance, the daily logs of Facebook mark up to 60 TB of data. Or e-bay has user data of 6.5PB (Figure obtained in 2009).
What is Data Science?
Data Science is a problem solver with the utilization of extracted value.
Or a more detailed approach to Data science can be;
‘A multi-disciplinary field of study that aims at managing, manipulating, extracting and interpreting knowledge from the huge amount of data.’
Data is categorized in two basic chunks:
- Small Data:
Any data that is comparatively easier to extract a value from is Small Data or simple Data.
- Big Data:
Data with huge volume, massive velocity, deep complexity, and diverse variety that makes extraction of value expensive and difficult is Big Data.
Data types are wide-ranged according to sources, from which they are collected.
- Relational Data consumed from tables, transactions and legacy data.
- Text Data is obtained through the Web.
- Semi-structured Data
- Semantic data obtained through social networks
A variety of fields and disciplines serve through the theories, tactics and techniques to investigate and analyse the massive amount of data. Science, economics, Finance, Politics, Engineering, Business and Education take advantage and benefit from this data. Computer science allows and involves concepts of Pattern recognition, data warehousing, performance computing, databases, artificial intelligence and visualization. Mathematics and statistics involve modelling and probability purposes.
How this DATA is gathered?
Social networking, E-commerce, online sales and purchase (Trading), Billing through online transactions are all variety of sources for the collection of data in a massive amount.
What do Data Scientists do?
Josh Wills says, ‘A Data Scientist is a person who is better at statistics than any software engineer and better at software engineering than any statistician.’
Data scientists extract knowledge by finding stories and are not reporters actually.
Data scientists basically align bigger chunks of data into a more structured and compelling form. They formulate advice for the executives for implications of products, processes, and decisions.
Data scientists work for national security, cybersecurity, business analytics, engineering and Healthcare and much more.
How does this Data Science work?
Firstly, the data is used for data warehousing and OLAP by aggregating it with the help of statistics.
Secondly, it is used for pattern matching and keyword-based search by indexing, querying and searching procedures.
Thirdly knowledge discovery through Data Mining and Statistical Modelling.
The cycle of Data science starts its journey from Problem statement (well-defined and clearly stated) then travels to Data Collection (primary as well as secondary) then comes Remediation (quality checking or inspection of data) and finishes at Data Analysis, also further it is delivered through Modelling and Data Communication.
Here comes another captivating final concept of basic discussion about Data Science and that is:
Data science vs. Data Mining
Data science is often confused and merged within Data Mining. Whereas DATA-MINING is a subset of data science that analyses BIG data to extract more useful information and values.
Click below to read more from TechoReview:
- What is Linux Ubuntu?
- Getting started with Linux Ubuntu 20.04 (New Features)
- Configuring the firewall in Linux Ubuntu 20.04
- Installation of Tor browser on Linux Ubuntu 20.04
- Installation of VLC media player on Linux Ubuntu
- How to install Python 3.9 on Linux Ubuntu 20.04?
- How to install Slack on Linux Ubuntu 20.04?
- How to install Wine on Linux Ubuntu 20.04?
- How to install google chrome on Linux Ubuntu 20.04?
My name is Akhunzada Younis Said. I am a software project manager in HAZTECH, a software engineering graduate and a content writer. I love working with Linux and open-source software.