Lab 2: Natural language processing

The goal of this lab is to use natural language processing (NLP) to measure a psychological construct and compare the validity of this measure with rating scales.

You do not need to prepare for the lab. But bring a computer.

Supporting powerpoints: Lab 2 NLP Statistical Semantics.pptx Download Lab 2 NLP Statistical Semantics.pptx

Below you find the steps that you need to do on the lab.

1. Create work groups. Divide yourself into groups of 2-4 students. List the names of the group in the Course Sheet.

2. Make a data collection

Define a to-be-measured psychological construct (e.g., depression )
Specify a semantic question of the construct
- A semantic question measuring the psychological construct (e.g. “Describe whether you feel depressed or not in two descriptive words, when you are in CONTEXT: ____ ____)
Specify a rating scale of the construct
- A rating scale measuring the psychological construct (e.g. “Describe your level of depression, when you in the CONTEXT: very low, low, medium, high, very high”
Context: To simply the data collection, generate (at least) six CONTEXTS where you hypothesis low (-1, e.g. “with friends”) or high (+1, e.g., ”when you fail an examine”).
Collect data so that you have a least 60 data points

Example

Describe whether you feel depressed or not in two descriptive words, when you are in CONTEXT

Describe your level of depression, when you in the CONTEXT: very low (-2), low (-1), medium (0), high (1), very high (2)

Context (hypothesis): with friends (-1), failed examine (+1)

Put your data in a Google Sheet and put the link to sheet in the Course Sheet.

Example datasheet with 2 context (but use six contexts or more). The experimental leaders fill in the columns "participants", "context" (the same for all participants), "hypothesis" (do not show for participants. Participants fill in the columns "words" and "ratings".

Table 1.

Participants	Context	Hypothesis	words	rating

1	with friends	+1	Happy, great	2
1	failed examine	-1	Low down	-1
2	with friends	+1	Fine good	1
2	failed examine	-1	Bad low	-2
3	with friends	+1	Fantistc ok	1

Structure your data file
- Use the structure specified in the table above. Each response on one row.
- Order the rows according to the hypothesis (i.e., all low hypothesis first then followed by all the high hypothesis)
- Paste in the ordered data in semanticexcel.com

Alternative data collections:

Replicate the methods founds in the article in Canvas:
Validation emotions: Sikström S, ̇ IV̄, Kuusela I, Evors N. Question-based computational language assessment shows higher validity than rating scales in the categorization of emotional state narratives. Preprint from Research Square, 16 Dec 2022. https://www.researchsquare.com/article/rs-2355099/v1 . PPR: PPR585649
Validation personality: Sikström S, Valavičiūtė I, Kajonius P. (submitted). Five Small Words Capture the Big Five: Personality Assessment Using Natural Language Processing.
Rating scales: Oscar N.E. Kjell*1,2, Katarina Kjell1, H. Andrew Schwartz*1,2. AI-based Large Language Models are Ready to Transform Psychological Health Assessment.

High norm:
- Generate 10 words that associated with high values on your construct
- For example: “down, low, tired….”
Low norm:
- Generate 10 words that associated with low values on your construct
- For example “happy, glad, strong….”
Collect data from at least five persons 5*10 words

3. Analysis of data

How to use semanticexcel.com, see: Sikström et al 2020 SemanticExcel.pdf Download Sikström et al 2020 SemanticExcel.pdf

On semanticexcel.com conduct the following analysis:

(If you get a warning message "Your connection is not private", ignore this message and press "Advanced" and the press on the link "semanticexcel.com".)

Rating scales
- Make a t-test between rating scales responses associated with a high hypothesis (i.e., rows with +1 in Table 1 above) and ratings scales responses associated with a low hypothesis (i.e., rows with .1 in Table 1 above). In semanticexecel.com, t-test can be found under Functions/Numerical functions/t-test)
- Correlate the rating scales with the hypothesis (i.e., calculate Pearson correlation between the column "Rating scales" and the column "Hypothesis" in Table 1 above). In semanticexecel.com, correlations can be found under Functions/Numerical functions/t-test)
Words norms (using Semantic similarity)
- For background read: [Kjell, Kjell, Garcia & Sikström, 2020] Semantic Similarity Scales.pdf Download [Kjell, Kjell, Garcia & Sikström, 2020] Semantic Similarity Scales.pdf
- Create norms. Goto "Scales/My norms/Create". Select/input your space (English 1),
  - Make a high norm (with a name e.g., "HappyHighNorm" using the high norm words (e.g., "happy glad fine...")
  - Similarly make a low norm (name "HappyLowNorm" with low norm words e.g., "sad down low...")
  - Similarly make a bipolar norm by putting high norm words ("happy glad fine...") in "text norm". Then press on "Advance option" and put the low norms in "Bipolar norm ( Subtract text norm )".
- Measuring your semantic questions using the norms. Goto Functions/Semantic Similarity/Norms. Measure the high, low and bipolar scales in three columns
- Evaluate the scales by t-test comparing the numerical norms values (numerical values) and the hypothesis (+1 or -1). Make a t-test between word norm numbers associated with a high hypothesis (i.e., rows with +1 in Table 1 above) and word norm numbers responses associated with a low hypothesis (i.e., rows with .1 in Table 1 above). In semanticexecel.com, t-test can be found under Functions/Numerical functions/t-test)
- Compare these results with the rating scales
Semantic t-test between high and low predictions
- For background read: [Kjell, Kjell, Garcia & Sikström, 2020] Semantic Similarity Scales.pdf Download [Kjell, Kjell, Garcia & Sikström, 2020] Semantic Similarity Scales.pdf
- Goto Functions/Semantic Test and validate your hypothesis
- Compare this results with the rating scales
Train
- For background read: [Kjell, Kjell, Garcia & Sikström, 2020] Prediction and Semantic Trained Scales.pdf Download [Kjell, Kjell, Garcia & Sikström, 2020] Prediction and Semantic Trained Scales.pdf
- Goto Functions/Train. (1) Input the cells with text data in "Train on text data in start cell/last cell". (2) Input the rating scale (numerical data) in "Train to predict numerical values in start cell/last cell". (3) Goto "Advanced options" (button of the page) and put the predicted in a new column ("Store cross-validated predictions in"). (4) Put the participants number as a grouping variable "During cross-validation, group training data based on"
- Compare the results with the rating scales (e.g., which t-value is the largest).
Plot
- Plot the word clouds of your data. Goto Functions/Plots.
  - Summarize your data collection in a single word cloud: Put your texts in "Texts (to plot: start cell/last cell".
  - Make word clouds describing words that discriminate your hypothesis: Add the columns with your hypothesis (numerical value of -1 or +1) as the x-axis on your word cloud "Select numerical values to plot on the x-axel from cell input (must be of the same length as the texts): start cell - last cell" . This plot will show two words clouds. The cloud on the left are words indicative of the low hypothesis and the cloud to the right your high hypothesis
  - Make word clouds describing words that discriminate high or low values on your rating scales: Add the rating scale on the x-axis as the x-axis on your word cloud (same as the row above)
Cluster
- Goto Function/cluster. Put your text in "Texts 1: start cell/last cell". Set Number of clusters to 2 or 4.

Conclusion

Compare the validity (i.e., whether you get significant results) of the different semantic measures (word norms, semantic test and training) and the rating scales. Are the results significant? Which measure have the highest effect size (look at the t-values). Add this to the discussion of your in your lab report.

Transformer based models - Addition analysis

Semanticexcel allows for using transformer based models, for example BERT. Rerun one, or all, of your analysis using BERT. The reason for why we have not used transformer based model earlier in this lab is that computation is slow, otherwise this is the recommend setting for all analysis.

Goto "Advanced options" (in one of your analysis described above). Under "Set Parameters here" select (1) "BERT", (2) "/Use models instead of LSA", and (3) "bert-base-uncased". Compare how well the BERT model perform compared to standard settings with LSA model.

4. Summary of data collection
For each group, summarize the results in on row in the course sheet (sheet Lab 2, row 40). That is write the psychological construct that you have studied, and the r and p values for the word norms, semantic test, tranining and rating scales

5. Write a lab report on your study. The lab report paper should be 3 to 5 pages long. Write the paper in the Google Docs and share a link in the Course Sheet. The paper should be completed be competed the week after the lab. The paper should include the the following headings.

a. Title page including: Title of the study and group members

b. A short conceptual overview of large language models in the context of this lab

c. Which psychological construct you choose and why you chose it

c. Hypothesis

d. Method

e. Results

f. Discussion

g. Conclusion

6. Read another group lab report and provide comments. You can make comments directly in the Google docs document (share the documents to that is possible). Group 1 comments group 2, Group 2 will be comment on group 3, etc. The last groups comment on the first group.

7. Make changes on your lab report based on the comments from the other group.

Things to think about

It is your responsibility to make the lab reports as clear as possible so that the rest of the reader understands the details.
The seminar is mandatory (presence and active participation).
It is possible to fail the seminar.