Almost anything often becomes DATA. Building a deep understanding of the varied data types could also be an important prerequisite for doing Exploratory Data Analysis (EDA) and have Engineered for Machine Learning models. you moreover may have to change over information sorts of certain factors so on structure proper decisions for visual encodings in information representation and narrating.
Most information is regularly ordered into 4 fundamental sorts from a Machine Learning viewpoint: mathematical information, straight out information, time-series information, and text.
Mathematical information is any information where information focuses are precise numbers. Analysts additionally may call mathematical information, quantitative information. This information has importance as an estimation like house costs or as a check, similar to the measure of private properties in l.a. or what percentage houses sold within the past year.
Mathematical information is regularly described by constant or discrete information. Continuous data can assume any value within a variety whereas discrete data has distinct values.
Categorical data represents characteristics, like a hockey player’s position, team, hometown. Categorical data can take numerical values. for instance, maybe we might use 1 for the color red and a couple for blue. But these numbers don’t have a mathematical meaning. That is, we can’t add them together or take the typical.
In the context of super classification, categorical data would be the category label. maybe able to|this may|this might|this could”> this is able to even be something like if an individual is a man or woman, or property is residential or commercial.
There is also something called ordinal data, which in some sense may be a mixture of numerical and categorical data. In ordinal data, the info still falls into categories, but those categories are ordered or ranked in some particular way. An example would be a class difficulty, as a beginner, intermediate, and advanced. Those three sorts of classes would be how that we could label the classes, and that they have a universe in increasing difficulty.
Time Series Data
Time series data may be a sequence of numbers collected at regular intervals over some period of your time. it’s vital, especially in fields like finance. statistic data features a temporal value attached thereto, so this is able to be something sort of a date or a timestamp that you simply can search for trends in time.
Text data is essentially just words. tons of the time the primary thing that you simply do with text is you switch it into numbers using some interesting functions just like the bag of words formulation.
These are four sorts of data from a Machine Learning perspective. counting on precisely the sort of data, this may need some repercussions for the sort of algorithms that you simply can use for feature engineering and modeling, or the sort of questions that you simply can ask of it.
Let me know if you’ve got any questions or comments. I might wish to write a piece of writing about feature engineering supported by different data types in the future. many thanks for reading.