Class notes for Quantitative Data

Descriptive vs inferential statistics
I choose to do some research into descriptive and inferential statistics, two terms I haven’t encountered before.

Descriptive statistics provide information about the properties of a data set. They only refer to the elements being actually measured.

In contrast, inferential statistics provide information about a population based on a sample. They make assumptions about the population overall based on the properties of the sampled elements.

Doing this Module 5 “Quantitative data collection and analysis” I was impressed by how many ways to analyse and show quantitative data using numbers, formulas, specific terms and concepts, tables, diagrams, charts, software, etc. For example, a Box and Whisker Plot or a Box Plot, which demonstrates the minimum, first quartile, median, third quartile, and maximum on the ‘same page’. I love it. It looks simple, clear, and understandable.

https://www.mathsisfun.com/definitions/box-and-whisker-plot.html

The terms of ‘ordinal’ or ‘categorical’ for the nature of dependent variables??

However, when I read the discussion of these they made a lot more sense to me.

Categorical variables have at least two categories but a higher or lower value cannot be ascribed to any of the categories ie eye colour – are brown eyes higher in value than blue? And where would green eyes fall? An ordinal variable has categories also, but these can be placed in an order of value from lowest to highest eg higher education awards can be order from lowest (Bachelor) to higher (Graduate Certificate) and so on. (The discussion then went into interval variables and the connection to ordinal, but I will have to read that again.)

OK, so I thought I’d try and wrap my head around what a T-test is. The most basic way I can describe this, I think, is that it is used to compare the mean values of one group against the mean values of another. T tests can only be used for comparison of two groups. For three groups or more, there is a different calculation / test that should be used (called ANOVA).

 

I tried to find an example of how a t-test could be used in the LIS sector, and found the following article:

 

Jiang, Y., Chi, X., Lou, Y., Zuo, L., Chu, Y., & Zhuge, Q. (2021). Peer reading promotion in

university libraries. Information Technology and Libraries, 40(1), 1 – 17. https://doi.org/10.6017/ITAL.V40I1.12175

 

I went down a bit of a rabbit hole, I think, because the analysis in that article feels a bit too high level for me to comprehend at this stage! I think I can see in Table 6 how the ‘T’ value represents the difference in means for two factors at a time (such as a comparison of: the mean for respondents seeking peer opinions for books compared to the mean for respondents with sparse social networks seeking peer opinions for books). I’m not sure how the resulting figure of “-3.408”, being a negative number, led the writers to deduce that respondents with sparse social networks seek peer opinions more often …? I’ll have to take a closer look.

 

As with lots of these terms, once you start researching them you come across other unfamiliar terms. I think I will need to now understand regression analysis to get a better understanding of t-tests!

Both Accidental/Convenience Sampling and Snowball Sampling are types of non-probability sampling.  Accidental sampling is where a sample is conveniently available and is able to fulfil the requirements to participate in the study. Wiliamson (2018) provides the example of a researcher standing outside an academic library asking students before they enter what their opinions are of the library.

Snowball sampling is where the researcher carefully chooses a few interviewees who fit the objective of the study and then relies on these initial participants to contact others similar to themselves. Snowball sampling is often used when making contact with possible participants is challenging. Williamson (2018) gave the example of using snowball sampling in relation to groups like the homeless. Neither accidental sampling nor snowball sampling should be used to make generalisations to the population.

One of those is from last week, “Quasi-Experiments”. I had to look into this term in further detail to fully understand it. This included reading the “APA Dictionary of Psychology”. From my reading, I now understand that a Quasi-Experiment is one where the control group and the experimental group cannot be selected at random due to a particular reason, often ethical. For this reason a quasi-experiment is not considered a true experiment.

The other term I was unfamiliar with was “statistical significance”. Although I was able to ascertain most of it’s meaning from the name I still had to read the definition carefully as it was not something I had come across before. From this I learnt that the definition of statistical significance is much more precise than I would have assumed. This means that for an event to be “statistically significant”, it must have a less than 5% chance of occuring randomly.

Whilst working in an academic library for the past two years I have come across both SPSS and R and knew that they were programs used to help analyse and organise data.  Students are regularly stressed when engaging in these programs, however I have not known the difference between them.  The programs can be used when Excel is not powerful enough to analyse data. Here is what I found:

R –  very poplular as it is an open source freely available resource used for statistical analysis, data visualisation and manipulation.  Advise and and work is shared by the user community.  It is driven by commands (coding) so can be complex, but is able to be programmed to handle different types of data and projects.  Free resources are readily available to support users.

SPSS – developed by IBM.  It is commonly used in health, education and marketing research.  The software analyses and manipulates data.  SPSS is described as being more user friendly and driven by a menu and point and click.  It resembles Excell but can handle larger data sets.  The main negatives are the high cost and limited functionality.

Leave a Reply

Your email address will not be published. Required fields are marked *