Tentamen (uitwerkingen)

AIDA 181 Study Guide 2023 with complete solution big data - sets of data that are to large to be gathered and alayszed by traditional methods 5 Vs of big data - volume variety velocity veracity value volume - far larger (consumes more storag

Beoordeling

Verkocht

Pagina's

Cijfer

Geüpload op

18-10-2023

Geschreven in

2023/2024

AIDA 181 Study Guide 2023 with complete solution big data - sets of data that are to large to be gathered and alayszed by traditional methods 5 Vs of big data - volume variety velocity veracity value volume - far larger (consumes more storage space) than traditional data stores variety - much of the data exists in multiple unstructured (this is defined later) formats velocity - speed at which the data arrives and the rate of evolution veracity - the completeness and accuracy is far less traditional data value - - potential is large but requires sophisticated data science techniques to extract the value text mining - obtaining info through language recognition data mining - analysis of large amounts of data to find new relationships and patterns that will assist in developing business solutions telematics - The use of technological devices in vehicles with wireless communication and GPS tracking that transmit data to businesses or government agencies; some return information for the driver Internet of things examples - network of options that transmit data to computers tvs, refrigerators machine learning - AI in which computers continually teach themselves to make better decisions on previous results and new data Artificial intelligence - compute processing/output that simulates reasoning or knowledge why do tech companies use AI - Tech companies use this to "Solve problems or to introduce new products" data science - An interdisciplinary field involving the design and use of techniques to process very large amounts of data from a variety of sources and to provide knowledge based on the data structured data example - data organized into databases with defined fields, including links between databases Microsoft Access unstructured data - Data that is not organized into predetermined formats. Such as databases, and often consists of texts, images, or other nontraditional media internal data examples - data that is owned by an organization premium records, adjuster photos external data - Data that belongs to an entity other than the organization that wishes to acquire and use it describe economic data in terms of external data - Data regarding interest rates, asset prices, exchange rates, the Consumer Price Index, and other information about the global, the national or a regional economy geodemographic data - data regarding classifications of a population what is the motto for data quality - garbage in - garbage out list some examples of typical data quality problems - missing data / NULL values duplicate records default values rather then actual values ways to detect data quality problems - descriptive statistics data cubes descriptive statistics - Qualitative summaries of the characteristics of a data set, such as the total or average data cubes - A multidimensional partitioning of data into two or more categories metadata - data about the data that provides context for analyzing transaction facts with efficient structures for grouping hierarchical information what type of data helps prevent data quality issues? - metadata what are some data mining techniques - classification regression analysis association rule learning cluster analysis what data mining techniques are used when looking for a specific answer - classification regression analysis what data mining techniques are used when you want to make a new discovery - association rule learning cluster analysis CRISP - DM - Cross Industry Standard Process for Data Mining an accepted standard for the steps in any data mining process used to provide business solutions data science fundamental concepts - systematic processes can be used to discover useful knowledge from data Information Technology can be applied to big data to reveal the characteristics of the topics of interest Analyzing data too closely can result in interesting findings that are not generally applicable data driven decision making - organizational process to gather and analyze relevant and verifiable data and then evaluate the results to guide business strategies purpose of data driven decision making - automate decision making for improved accuracy and efficiency organizing large volumes of new data discovering new relationships in data what approaches are commonly used in data driven decision making - descriptive approach predictive approach descriptive approach example of problem used for - to solve a specific problem or specific question has the loss ratio improved by existing certain states or region? predictive approach - supervised learning - a type of model creation, derived from the field of machine learning in which the target variable is defined purpose of supervised learning - answers a specific question need to know what data to use unsupervised learning - type of model creation, derived from the field of machine learning, that does not have a defined target in mind predictive model - a model used to predict an unknown outcome by means of a defined target variable where does a predictive model come from? what is an example of when it would it be used - generally developed by using supervised learning personal auto loss cost attribute - a variable that describes a characteristic of an instance within a model instance - representation of a data point described by a set of attributes within a model's dataset target variable - the predefined attribute whose value is being predicted in a data analytical model class label - value of the target variable in a model what is a descriptive model? what is the outcome? - model used to study and find relationships withing data No specific course of action recommended Will lead to more specific questions that can be answered with a predictive model entropy - a measure of disorder in a dataset measure of how unpredictable something is information gain - measure of the predicative power of one or more attributes how much info a odel can provide about the target variable lift - in model performance evaluation: lift = (postive predictions made by the model ) / (positive predictions that would be made in the absence of the model) leverage - (percentage of positive predictions made by the model) - (the percentage of positive predictions that would be made in the absence of the model) similarity - measure of how alike two data instances are distance - way to indicate the inherent similarity of data points nearest neighbor - most similar instance in a data model k nearest neighbor - an algorithm in which "k" equals the number of nearest neighbors plotted on the graph combining function - combination of two functins to form a new function majority combining function - gives equal weight to all of the nearest neighbors regardless of distance link prediction - a prediction of the connection between data items used in measuring similarity in networks training data - data that is used to train a predictive model and that therefore must have known values for the target variable of the model holdout data - in model training process, exisitng data with a known target variable that is not used as part of the training data why have holdout data - avoid overlifting and generalization overlifitng - process of fitting a model to closely to the training data gerneralization - the ability of a model to apply itself to data outside of the training data accuracy equation - (TP + TN) / (TP + TN + FP + FN) tp = true positive tn = true negative fp = false positive fn false negative precision equation - TP / (TP + FP) recall - TP / (TP + FN) F-score - 2* (Precision*Recall) / (precision + recall) what are some traditional methods of supervised learning - regression classification ranking regression analysis - A statistical technique that is used to estimate relationships between variables linear function - A mathematical model in which the value of one variable changes in direct proportion to the change in the value of another variable linear equation - y = mx + b y= the function f(x) the dependent variable m = slopw x = independant variable b = y-intercept linear regression - The statistical method to predict the numerical value of a target variable based on the values of explanatory variables logistic regression - a type of generalized linear model in which the predicted values are probabilities linear discriminant - data analysis rechnique that uses a line to seperate data in two classes instance space - area of two or more dimensions that illustrate the sitribution of instances generalized linear model - statistical technique that increases the flexibility of a linear model by linking it to a nonlinear function generalized linear model equation - y = h(x) + e h = link function = mathematical function that describes how therandom values of a target varibale depend on the mean value generated by the linear comination of the explanatory varibales x = systematic component = linear combination of the explanatory variables e = random component = probability distirbution of the response variable output of a one attribute model - statistical output is used to assess the goodness of fit of the model. It helps to answer questions such as: Is the overall model statistically significant? Are the predictor variables statistically significant? exploratory data analysis - charts/graphs that show data patterns and correlations among data correlation matrix - A matrix that compares attributes where the stronger the correlation, the darker the cell color shade in the matrix traditional methods - methods histricially relied on by insurers and risk managers usually involves structured data new data analysis techniques - evolving out of the convergence of big data and new technology usually uses unstructured data classification trees - A supervised learning technique that uses a structure similar to a tree to segment data according to known attributes to determine the value of a categorical target variable node - representation of a data attribute arrow - a pathway in a classification tree root node - first node in a classification tree leaf node - A terminal node of a classification tree that is used to classify an instance based on its attributes classification analysis - suervised learning technique to segment data according to the values of known attributes to determine the value of a categorical target variable least squares regression - A type of linear regression that minimizes the sum of the squared values of the errors based on the differences between actual observed and estimated (mean) values multiple regression - A variation of linear regression were by two or more attributes (explanatory variables) are used for prediction assumptions of linear models - Variance around the mean of the target variable is uniformly distributed ("Normality") Homoskedasticity In linear modeling, variance around the model errors around the estimated mean of the target variable is the same regardless of its [estimated mean] size cluster analysis - model that determines previously unknown groupings of data association rule learning - Examining data to discover new and interesting relationships among attributes that can be stated as business rules text mining - developing models that analyze text corpus - collection of writings or documents token - In a text mining context, a meaningful term or group of terms stemming - The removal of a suffix from a term for the purpose of text mining stopward - A common word that offers little meaning and is filtered out in a text mining context network analysis - study of nodes and edges in a network sociogram - graphical representtion of a social network node - in network analysis, a point in a network centrality measure - In a social network context, the quantification of a node's relationship to other nodes in the same network local variable - In a social network analysis, an attribute that relates to the characteristics of a specific node network variable - In a social network analysis, an attribute that relates to the node's connections within the network nueral network - A data analysis technique composed of three layers including an input layer, a hidden layer with nonlinear functions, and an output layer, that is used for complex problems. (expressed in numerical data only) what are some primary rating factors - territory use of auto age and gender marital status what are commerical vehiceles rated on? - vehicle rate and type vehicle use radius of operation special industry classifications vehicle telematics - The use of technological devices in vehicles with wireless communication and GPS tracking that transmit data to businesses or government agencies; some return information for the driver usage based insurance - type of auto insurance in which the premium is baed on the PH driving behavior loss ratio - ratio that measures losses and loss adjustment expenses against earned premiums reflects % of prmeiums being consumed by loss traditional rating variables for homeowners rating include: - years built construction type location in homeowners segmentation, what is in correlation with low rate increases? high rate increases? what if there is no segmentation? - Lower rate increases for low loss ratios, higher for high loss ratio With no segmentation, all policies get the same rate increase who has product liability? - Any business that makes, sells, distributes, or even gives away products has loss exposures that can lead to product liability losses what are some common ways to identify fraudulent claims? - Through traditional fraud indicators and through social media data Apply network analysis by examining links and suspicious connections Apply cluster analysis to discover claims characteristics that might indicate fraud what is a complex claim? - A claim that contains one or more characteristics that cause it to cost more than the average claim what is workers compensation? - Reparation for employee costs of a work-related injury or illness that arose from and occurred during the course of employment. no fault coverage - Benefits received are not conditional on how the accident happened (employee error or employer error) system safety analysis - Examines an organization as a whole to identify potential sources of accidents root cause analysis - Determining the event or circumstance that directly leads to an accident failure made and effects analysis and criticality analysis (FMEA) - Allows an organization to determine which risks to address and in what priority system safety - A safety engineering technique also used as an approach to accident causation that considers the mutual effect of the interrelated elements of a system on one another throughout the system's life cycle root cause analysis - A systematic procedure that uses the results of the other analysis techniques to identify the predominant cause of the accident what are the limitations of fault tree analysis - high degree of uncertainty with underlying/base events = probability of accident is hard to find Important pathways to the accident might not be explored if some causal events are not included in the fault tree what can traditional analysis not account for? - human error conditional failures reputation risk - The risk that negative publicity, whether true or not, will damage a company's reputation and its ability to operate its business closeness - The measure of the average distance, or path length, between a particular note and the other nodes in a network degree - The measure of the number of connections each node has betweenness - The measure of how many times a particular node is part of the shortest path between two other nodes in a network incurred losses - The sum of the paid losses and loss reserves and loss adjustment expense reserves ultimate loss - The final paid amount for all losses in an accident year long tail claim - A claim in which there is a duration of more than one year from the date of loss to the date of closure adverse development - Increasing claims costs for which the reserves (based on ultimate loss estimates) are inadequate excess liability insurance - Insurance coverage for losses that exceed the limits of underlying coverage or a retention amount what are cluster analysis outliers? - claims that: reported more then week after the date of the accident reresenting laibility denied by the claims rep where the claimant is represented by an attorney law of large numbers - a mathematical principle stating that as the number of similar but independent exposure units increases, the relative accuracy of predictions about future outcomes (losses) also increase text mining vs data mining - text mining: obtaining info through language recognition data mining: analysis of large amounts of data to find new relationships and patterns that will assist in developing business solutions telematics - use of technological devices in vehicles with wireless communication and GPS tracking that trasmit data to businesses or governemnt agencies internet of things - network of objects that transmits data to computers statistical plan - formal set of directions for recording and reporting insurance premiums, exposures, losses, and sometimes loss expenses to a statistical agent descriptive statisitcs - quantitative summaries of the characteristics of a dataset, such as the total or the average data cube - multidemensional partitioning of data into two or more categories malware - malicious software, such as a virus that is transmitted from one ocmputer to another to exploit system vulnerabilities in the targeted computer Cross Industry Standard Process for Data Mining (CRISP-DM) - accepted standard for the steps in any data mining process used to provide business solutions compare big data 1.0 to big data 2.0 - big data 1.0: companies started using the internet to conduct business and compile data about customers and used that data from apps to improve uw efficiency and provide customer information for product development and maketing big data 2.0: allows organizations to obtain and aggregate vast amounts of data very quickly and extract useful knowledge from it. also allows organizations to process and analyze data from sources such as vehicles and wearable technology

Meer zien Lees minder

Instelling

AIDA 181

Vak

AIDA 181

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Geschreven voor

Instelling: AIDA 181
Vak: AIDA 181

Documentinformatie

Geüpload op: 18 oktober 2023
Aantal pagina's: 12
Geschreven in: 2023/2024
Type: Tentamen (uitwerkingen)
Bevat: Vragen en antwoorden

Onderwerpen

aida 181
aida 181 study guide 2023 with complete solution
telematics the use of technological devices in
examples network of options that transmit data
machine learning ai in which compute

$10.99

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

Wiseman

3.9

(1387)

Maak kennis met de verkoper

Wiseman NURSING

Bekijk profiel

Volgen

Verkocht

6791

Lid sinds

4 jaar

Aantal volgers

3846

Documenten

26351

Laatst verkocht

1 uur geleden

Premier Academic Solutions

3.9

1387 beoordelingen

683

249

215

164

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Wiseman. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $10.99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 47909 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen