Big Data - Answer - Sets of data that are too large to be gathered and analyzed by traditional methods.
Data Governance - Answer - Is a framework of rules, standards and decisions for managing data. The program set standards and oversee the management of a firm's data assets in order to meet quality standards and to prevent abuse.
Five Characteristics of Big Data - Answer - 1. Volume
2. Variety
3. Velocity
4. Veracity
5. Value
Sources of Big Data - Answer - Internal data - Owned, captured and stored by an organization.
External data - Belongs to an entity other than the entity that wishes to use it.
Internal Data - Answer - Data owned by the entity that uses it.
External Data - Answer - Belongs to an entity other than the entity that wishes to use it.
Structured Data - Answer - Is data organized into databases
with defined fields and links between and among
databases.
Unstructured Data - Answer - Is data that is not organized and
that often consists of text, images, and nontraditional media.
Third-party data includes - Answer - Geo-demographic data (classification of population), economic data (interest rates, assets prices, exchange rates and consumer pricing index) and credit rating
Economic data. - Answer - Includes interest rates, asset prices, exchange rates, and the consumer price index
Geodemographic data - Answer - Regards classifications of population groups. Big Data Categories - Answer - 1. External and structured - telematics, financial data, labor statistics.
2. External and unstructured - Social media, new reports and internet videos.
3. Internal and Structured - Policy information, claims history and customer data.
4. Internal and unstructured - Adjusters notes, customer voice recordings and surveillance videos.
Predictive Modeling - Answer - Uses a defined target variable to predict or estimate an unknown outcome. Used to predict future values and estimate unknown past or present values.
Target Variable - Answer - Is the attribute whose value is being predicted in a data analytical model.
Seven Steps in Building a Predictive Model - Answer - 1. Gather historic data.
2. Divide date into training data and holdout data.
3. Build the model using the training data.
4. Apply the model using the training data.
5. Use performance metrics to evaluate the model.
6. Use feedback to adjust the model, repeating Steps 3, 4 and 5 as needed.
7. Put the model into production and reevaluate as needed.
Training a Predictive Model - Answer - Using existing data to create predictive models that help them anticipate behaviors.
Training Data - Answer - Data that is used to train a predictive model and that therefore must have known values for the target variable of the model.
Overfitting - Answer - Occurs when the model is so closely tailored to the training data that it is not effective on other, new data.
Holdout Data - Answer - Existing data with a known target variable that is held back and not used as part of the training data. The data is used to test the model to make sure that it performs well on known
data.
Generalization - Answer - A model's ability to apply itself to data outside the training data.
Performance Metrics - Answer - Can be presented in a confusion matrix.
1. Accuracy -- Measures how often the model predicts the correct outcome.
Equals correct predictions divided by total predictions.
Accuracy = (TP+TN) / (TP+TN+FP+FN).