Exam (elaborations)

Contextualizing Geometric Data Analysis and Related Data Analytics: A Virtual Microscope for Big Data Analytics

Rating

Sold

Pages

Grade

A+

Uploaded on

26-07-2024

Written in

2023/2024

Contextualizing Geometric Data Analysis and Related Data Analytics: A Virtual Microscope for Big Data Analytics Fionn Murtagh, Mohsen Farid Big Data Lab, Department of Electronics, Computing and Mathematics, University of Derby, Derby, UK Email: February 16, 2022 Abstract An objective of this work is to contextualize the analysis of large and multi-faceted data sources. Consider for example, health research in the context of social characteristics. Also there may be social research in the context of health characteristics. Related to this can be requirements for contextualizing Big Data analytics. A major challenge in Big Data analytics is the bias due to self selection. In general, and in practical settings, the aim is to determine the most revealing coupling of mainstream data and context. This is technically processed in Correspondence Analysis through use of the main and the supplementary data elements, i.e., individuals or objects, attributes and modalities. Keywords: analytical focus; contextualization of data and information; Correspondence Analysis; Multiple Correspondence Analysis; dimensionality reduction; mental health 1 Introduction Clearly, regression and other supervised, predictor and predicted relationships, have their fixed roles in all of data analytics. In unsupervised analytics, information discovery is at issue, from what can be multi-faceted data sourcing. It may be a case of dynamic (evolving) data and also data of heterogeneous precision (sensitivity, trustworthiness). Also, at issue may be both qualitative and quantitative data. Examples of such multi-faceted data sources are the increasing prevalence of secondary data sources in the domains of Big Data, and Internet of Things. Such secondary data sources may well count as contextual data. Given multi-faceted data, it is, or it can be, open to the data scientist to consider different contextualization strategies. Consider e.g. how mobile communications, and their 1 arXiv:1611.09948v1 [cs.AI] 30 Nov 2016 monitoring, provide locational data on all device bearers. So, for example, transport analytics can be contextually based on such secondary data and information sources. In this article, it is sought to explore general strategies for contextualizing analytics. Technically, the main attributes are aided, for analytical interpretation, i.e. extracting information from data, and consolidating that information into knowledge, by the supplementary, contextual, attributes. The general discussion of [17] points to the need for focus in benefits to be drawn from data analytics. In a general sense, we may regard dimensionality reduction as a form of focused data analytics, since typically (in Correspondence Analysis, or Principal Components Analysis, PCA, or multidimensional scaling, etc.) the percentage variance or inertia explained by the reduced dimensionality space becomes the focus. Similarly feature selection is often used, to focus on the selected variables. Often the clustering of data is with the objective of retaining the clusters for the analytics. Therefore summarization is associated with focusing the analysis. Thus focus is a key practical consideration in data analysis. Then, though, Benz´ecri’s principle of what we are analysing being both homogeneous and exhaustive should be very central also, from a practical perspective. Just in this practical sense, we begin by regarding contextualization as the association of two or more foci, i.e. two or more directions of analytical interest. This association is asymmetric. One focal point is located with reference to the other. This can be generalized to multiple focal directions in our analyses. We begin with such contextualization in general. Then we take contextualization further in the context of Big Data. In the former, applications related to health and lifestyle analytics are at issue. In the latter, a central theme is the use of secondary data, whether or not with Big Data characteristics, so as to direct and focus the analytics. 1.1 Secondary and Contextual Data Sources: Complementarity of Supervised and Unsupervised Analytics There follows a short description of unsupervised analytics, related to dynamics and evolving contexts, and supervised analytics, related to having a training set and invoking a machine learning or statistical modelling approach to specified, stable daa sources. In [1], the 2nd principle of data analysis is the following: “The model should follow the data, and not the reverse.” Unlike statistical testing that justifies a model, “... what we need is to have a rigorous method that extracts structures from data”. “A model is, briefly, a system of formulas which allows the calculation, as a function of the unobservable variables, the observed quantities”. “The term, unobservable variable, is, in a certain sense, relative to the state of science. More than a physical measure, firstly conjectured, such as energy or electric charge, it is now recognised to exist, and is measured as space or time. But the human sciences grope around still to establish rigorous laws: and while in astron2 omy some very simple axioms govern the movement of the most complex systems, a psychologist can rarely boast that the exhaustive study of elementary phenomena allows the precise prediction of the evolution of a complex case: ... the reach of a model rarely goes beyond the determined field of observations for which it was conceived. ... Often, effectively, the model has so many unobservable variables, and the experiments are so difficult (or the observations are so rare), that one has no trouble to explain the latter by the former (observations by experiments). ... The model allows, optimally, to predict, but not to understand. ... our aspiration to understand (not only to predict)”. In [18] relating to activities of daily living (using the acronym, ADLs) of the elderly and those in poor health, there is use made of “contextual information from uncertain sensor data”. There is this: “... our algorithm learns directly from incomplete data, and inhabitants’ behavioural patterns are characterized using the learned probability distribution over various activities. The model is used to infer the activities, and the inhabitants who have carried them out.” For example, the monitoring of a kettle boiling water and perhaps some other sensor, these sensor data are used to model and predict some aspects of “making a tea with milk and sugar”, “making coffee with milk”, etc. Further predictive models are at issue in [19], leading to a mobile app that will trigger personalized reminders. The foregoing examples are such that sensor systems provide contextual data, as also do movement and other types of activities, with the main attention given to health and mental well-being. 2 Short Introduction to General Contextualization Correspondence Analysis, CA, is most appropriate for latent semantic analysis, with input data being a cross-tabulation of observations or individuals, and variables or attributes. The cross-tabulation is usually frequency of occurrent data, encompassing presence-absence values. There can be considered also recoded quantitative data. Quantitative data, without recoding, can be used. Therefore CA is very appropriate for mixed qualitative and quantitative data. Categorical data is quite central here. Contrasted with PCA, where attribute centering to zero mean, and attribute reduction to unit standard deviation, termed standardization when both carried out, and contrasted with the TF-IDF, term frequency, inverse document frequency that is used in Latent Semantic Analysis of document-term cross-tabulation data, CA uses the following. The chisquared distribution is defined on both rows and on columns. This is a weighted Euclidean distance of, respectively, row and column profiles. A profile is defined as the row or column vector being divided, respectively, by the row or column total. This is what results from frequencies in the input cross-tabulation matrix, i.e. each value divided by the grand total. In [10, 14], CA is characterized as “a tale of three metrics”, these being the chi-squared metric for the dual clouds of observations and attributes

Show more Read less

Institution

Data Analytics

Course

Data Analytics

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Data Analytics
Course: Data Analytics

Document information

Uploaded on: July 26, 2024
Number of pages: 19
Written in: 2023/2024
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

analytical focus contextualization of data and in

$11.49

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

GlobalExamArchive

3.6

(16)

Get to know the seller

GlobalExamArchive Acupuncture & Integrative Medicine College, Berkeley

View profile

Sold

Member since

3 year

Number of followers

Documents

1509

Last sold

3 days ago

GlobalExamArchive – International Study Resources

GlobalExamArchive is an international academic resource platform dedicated to providing original, well-organized study materials for students across diverse disciplines. Our archive includes carefully prepared test banks, solution manuals, revision notes, and exam-focused resources designed to support effective learning and confident exam preparation. All materials are developed independently with a focus on clarity, academic integrity, and relevance to modern curricula, serving students from institutions worldwide.

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller GlobalExamArchive. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $11.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 49283 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Contextualizing Geometric Data Analysis and Related Data Analytics: A Virtual Microscope for Big Data Analytics

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?