Homework assignment #4 - Introduction to Python - 3

Ahoi hoi folks and welcome to homework assignment #4 of the “Python for Psychologists” course, winter term 2024. Within this homework assignment you’ll finih up with the python lessons.

Some general points:

Throughout the assignment you will be asked to either provide some answers to questions in written form or directly code. Regarding the first, you can just write your answer in the empty markdown cells below the questions. Regarding the second, you can write your code in the empty code cells below the questions and actually run it so that the output (if there’s one) will be included in the notebook. The number and type of cell you need to answer a given questions will be indicated but if you want to use more cells please don’t hesitate to add a few. If you see something like “Type Markdown … or Please provide your solution here “, you can just click (or double-click) on it to make it a markdown cell.

Please don’t forget to save your notebook regularly (maybe every few minutes) either via strg + s on windows and cmd + s on macOS. Once you finished the assignment, pleas save the entire notebook via File -> Save as... and provide the following name FirstNameSurname_pfp_2024_homework_assignment_4 where FirstName should be changed to your first name and Surname to your surname.

The aim of this homework assignment is to go through the things we talked about in the session again and deepen your understanding of them. This will entail the following parts:

1. Control Flow operations
    a. if-else
    b. for-loops
    c. while-loops
2. Functions
3. Outro/Q&A

If some of the things asked here were actually not (sufficiently) covered during the session, please let me know.

Again, I hope you have a great time and that you view this as less like an assignment and more like a fun add on to the session. Remember: Be a mover! Learning new things takes time and frustrations can’t always be avoided. If you can’t figure something out it’s best to take some time off, do something enjoyable for a while and come back with a fresh mind.

Again you putting in the effort is already fantastic and what really matters!

via Giphy

Homework assignment #4#

1. Control Flow#

During the session we talked about different control flow methods in Python. Could you please list the ones we’ve discussed, what they do and when you’d use them? You can just write your answers in the markdown cells below.

Please write your answer here.

Great, thank you! Now let’s work with each of these a bit, so you get the hang of it. This time we will give you two time series that we want to compare a bit in this homework assignment. These time series could for example be the brain activity in two different regions. That means that each value in our lists represents one time point.

ts_one = [0.13307103, 0.52482272, 0.90879086, 0.5456777 , 0.9319212 ,
       0.19735405, 0.36852966, 0.02232372, 0.0799484 , 0.66565337,
       0.98361107, 0.60459186, 0.4652598 , 0.38765896, 0.2889894 ,
       0.68504588, 0.69733066, 0.2196992 , 0.6629188 , 0.86817003]

ts_two = [0.89998698, 0.7969356 , 0.47385577, 0.15619482, 0.54274163,
       0.82320681, 0.27859028, 0.47321448, 0.51521323, 0.96664894,
       0.40658335, 0.0655123 , 0.88093774, 0.74728249, 0.47394902,
       0.33324691, 0.70377158, 0.49065974, 0.60635258, 0.07307129]

a. if-else#

First, let us compare the number of time points in each time series. This is relevant, as otherwise it will lead to issues when we want to compare them later.

Please print whether the two time series are equal in length using an if statement. Store the result of your comparison in a variable called same_length as a boolean. Else if they are not equal in length, please also indicate, which time series is longer (e.g. via print).

# Please write your solution here

Awesome! Now we can use our same_length variable to do further analyses.

Let’s now also compute the mean of each time series and store it in a variable, but only if the time series have the same length! How would you do this? To calculate the mean of the lists you can go back to the previous homework assignments and sessions where we explored different options to do this or you could use simple math operators.

# Please write your solution here

That’s it for specific if-else exercises. If you still feel a bit unfamiliar with these statements, don’t worry, you will get plenty of exercise in the following sections.

b. for-loops#

To verify if our calculations above were correct let’s re-calculate the mean of each time series using a for loop. Please calculate the mean of each time series and store it in a variable.

Hint: It’s possible to update an existing variable in a for-loop by using the operators we’ve discussed in the last session.

# Please write your solution here

That’s already a nice loop, well done. Next, we also want to calculate the minimum and maximum of these lists. Please implement this with a for-loop.

First initialize the variables max_ts_one and min_ts_one with the first element of the ts_one list. Then loop over all elements in your list to check whether the current element is the new maximum or minimum and update the values if this is the case. Please do the same for the other time series.

Hint: For this you will need an if-statement to analyze the elements in the loop.

# Please write your solution here

For a later analysis you want to square each value in both lists separately. Please do so using a for loop and appending each squared element to a new list e.g, ts_one_squared = [].

# Please write your solution here

Optional Bonus: Looking through the notebook of this session you’ll encounter an easier way to do operations on each element of a list called list comprehensions. Using the examples in the lesson or the power of google, do the same as above (i.e. square each value in both lists separately).

# Please write your solution here

In a short moment of weakness, you think it might be better to store the ts_one list in a dictionary instead of a list, where each key has the format: TP_XX, where XX represents the index of the time point in the list. Please define a dictionary ts_one_dict where you store the values in the proposed format. But to save yourself some effort use a for-loop to loop over the elements in the list and create the key - value pair which you want to add to your dictionary.

Hint: the enumerate function might be useful.

# Please write your solution here

Now that you have your dictionary, loop over it and print the time point “name” (stored in the keys) and the associated value, but only if the value is above 0.5. If this is not the case simply continue to the next time point.

E.g. your printed key - value pairs could look like this: TP_0: 0.13307103. Hint: probably you should use continue

# Please write your solution here

Optional Bonus: You realize that maybe a dictionary wasn’t that useful after all. Do you know how to delete a variable in Python? If so, please delete your dictionary.

# Please write your solution here

You think that a value below 0.1 is implausible in your analysis. Therefore, you want to test if there is such a value in the time series. If so, you want to stop the iteration through the list and print a statement that you found an implausible value in the respective list. Please do this for both lists.

Hint: try to remember how you can stop the loop

# Please write your solution here

c. while-loops#

Puh, that was a lot already. Let’s now quickly review the while-loops. Could you please just quickly state, why one has to be a bit more careful when using while instead of for loops and what one could do when caught in an infinite loop?

Please write your solution here

Time series one shows lower values in the beginning than time series two. We are now interested in when this reverses, i.e., when do the values in ts_one become larger than in ts_two? To do so we will use a while-loop to iterate over the two lists and compare their elements to test whether ts_one is larger than ts_two.

Define a variable is_lower = True that is set to False once ts_one is larger than ts_two. Once ts_one is larger than ts_two print the index of that element.

Hint: You can use a counter variable that counts the number of iterations you spent in your loop. This variable is zero at first and increases with each iteration. This will also allow you to access the elements in your list and to print the index when ts_one exceeds ts_two.


Optional: What would you have to do, to not run into an infinite loop if ts_one is always smaller than ts_two?

# Please write your solution here

2. Functions#

Now that the control flow is done, we are already at the last section of this homework assignment!

Here we will use functions to make coding a bit more efficient, so you don’t have to copy & paste that often. You will implement several functions in this section. However, before we start, could you please describe what functions are in python and name what important building blocks they entail?

Please write your solution here


Now lets create some functions that will ease up the processes you have worked on above. How about we start with a function that computes the mean of each time series and prints the respective results? A potential workflow could look like the following:

  • define a function that takes two positional arguments, i.e. the two time series

  • for each time series compute the mean as done above and assign it to a variable

  • print the means using string formatting, e.g. “The mean value of time series 1 is: [insert_mean_value_here]”

# Please write your solution here

Oh my, that’s awesome! However, we just did some very bad python programming: we didn’t add a docstring that describes what our function does, its inputs and what it returns.

via GIPHY

Could you please add a somewhat informative docstring?

# Please write your solution here

Now let’s further evaluate if the docstring actually fulfills it’s purpose by using the help function on your newly created function

# Please write your solution here

Seems like everything works as expected, nice! One thing that might be a useful addition to our function would be the option to not only print the mean values but also return them. Remember python keywords?

A feasible way to implement this would be via an keyword argument that takes booleans as values and has a default value of False. In other words:

If the keyword argument is set to False or not indicated in the function call (thus, utilizing the default value) the mean values should only be printed like implemented in the previous function. However, if the keyword argument is set to True, the mean values should be printed and additionally returned. Could you please update your function accordingly?

# Please write your solution here

Amazing! Given your fantastic work on this, could you please create corresponding functions for the min and max values of the time series? You can of course choose if you want to create several functions or if you want to create one function that does it all.

# Please write your solution here

Optional bonus:

The remaining tasks within this part are optional and thus don’t need to be completed if you don’t want to, this won’t affect the final assessment of your homework assignment. However, we obviously strongly suggest that you give it a try as it will deepen your understanding of functions and the other parts of this and the previous sessions. So, let’s get cooking!

via GIPHY

Overall, we want to implement a function to calculate the correlation between the two time series and also a function to scale the values between 0 and 1. Please use docstrings, to indicate what each function does and to specify the input parameters and the return value.

Let’s first start with the correlation. To calculate the correlation between a variable \(x\) and a variable \(y\) we use the following equation:

\[r(x,y) = \sum_{i=1}^N \frac{x_i - \mu_x}{\sigma_x} \frac{y_i - \mu_y}{\sigma_y}\]

where, \(N\) is the number of data points, \(\mu\) is the mean and \(\sigma\) is the standard deviation of variable \(x\) or \(y\), respectively.

Now, let’s start with a function to calculate the mean.

Define a function called calc_mean that takes exactly one list as argument and returns the mean value.

You can simply copy your code from above into the function. In case you didn’t know how to solve it above, please simply define a variable mean_value, set it to 0 and return it.

# Please write your solution here

Nice, the basics are there!

via GIPHY

Now let’s also create a function to calculate the standard deviation. The standard deviation is defined as follows:

\[\sigma_x = \sqrt{\frac{\sum_i^N{(x - \mu_x)^2}}{N}}\]

where, \(N\) is the number of data points in the list \(x\) and \(\mu_x\) is the mean.

Let’s write a function calc_stdev that takes a list as input and a mean as an optional argument. This can be achieved by setting the default value of the mean input argument to None.

Following, if no mean is given, i.e., the mean is equal to it’s default value, call your function from above to calculate the mean of the input list. The function should return the calculated standard deviation.

You are free to implement this in the best way for you personally (as long as you use some kind of loop or list comprehension. Please refrain from built-in functions).

In case you need some help, you can follow the steps below:

  • First, define a variable, where you will store your final standard deviation in. E.g. st_dev.

  • Then get \(N\), which is the length of your list

  • In case that no mean was given, calculate it using your function from above and store it in a variable e.g. mean_value.

  • Loop over all list elements. For each element in the loop:

    • subtract the mean from this element (\(x - \mu_x\))

    • square the result and divide it by \(N\)

    • Add the result of the previous step to your standard deviation variable

  • After you looped over all elements take the square root of st_dev. Remember the square root is the same as taking the power of 0.5

  • return your calculated standard deviation.

# Please write your solution here

Awesome job!

via GIPHY

Okay, now we want to implement a scaling function that allows us to standardize or normalize (min-max) our data. Scaling your data can often times be useful. For example some Machine-Learning algorithms work best, when the data is between 0 and 1.

Data is standardized as follows:

\[x_{standardized} = \frac{(x - \mu_x)}{\sigma_x} \]

where \(\mu_x\) is the mean of x and \(\sigma_x\) is the standard deviation.

Normalization on the other hand works as follows:

\[x_{normalized} = \frac{(x - min_x)}{max_x - min_x} \]

where \(min_x\) and \(max_x\) are the minimum and maximum in the list \(x\), respectively.

For the standard scaler we already have all the preliminaries (functions to calculate the mean and standard deviation). We can now combine them in a function called standardize that will standardize our data.

The function should take a list as input argument and return the standardized list. Again you are free to implement this on your own. If you couldn’t implement one of the functions for the mean or the standard deviation, please simply use a default value of 1. In case you need a little help, follow these steps:

  • Define a new (empty) list (e.g. standardized)

  • Calculate the mean and the standard deviation of the input list (e.g. ts_one)

  • Loop over the elements in your list. And for each element

    • Subtract the mean

    • Divide the result by the standard deviation

    • Add the value to the standardized list

  • return the standardized list

# Please write your solution here

Great work, let’s keep going!

via GIPHY

For the min-max scaler, we only need a function to calculate the minimum and maximum. You already did this somewhere above, so it’s mainly copy and paste.

Define a function called calc_minima_maxima that again takes one list as input and returns the minimum and maximum of this list.

# Please write your solution here

Next we’ll write a function called normal_scaler, that takes a list as input, calculates the minimum and maximum of the values in the input list and scales every element of the list via the formula mentioned above:


\[x_{normalized} = \frac{(x - min_x)}{max_x - min_x} \]

You can output a list containing the scaled values of the input list via the following steps:

  • Define a new (empty) list (e.g. normalized)

  • Calculate the minimum and the maximum of the input list (e.g. ts_one)

  • Loop over the elements in your list. And for each element

    • subtract the minimum

    • divide the result by the difference between the maximum and minimum

    • add the value to the normalized list

    • return the normalized list

# Please write your solution here

Amazing!

via GIPHY

Now we can assemble our scaler function using our two different scaling approaches.

Define a function called scaler that takes a list as argument, as well as the variable scale_type. The scale_type argument will allow us to flexible decide, how we want to scale the list. It should have standardize as default value, meaning that we will standardize the data if not indicated otherwise.

Use an if statement to test whether the scale_type argument is standardize or normalize and then call one of your above defined scaling functions accordingly.

Return the scaled list.

# Please write your solution here

Okay, we are already at our last step! Good job keeping it up until here.

via GIPHY


We next want to calculate the correlation between a list \(x\) and a list \(y\). We use the following equation:


\[r(x,y) = \frac{1}{N} \sum_{i=1} ^N \frac{x_i - \mu_x}{\sigma_x} \frac{y_i - \mu_y}{\sigma_y} \]

where, \(N\) is the number of data points, \(\mu\) is the mean and \(\sigma\) is the standard deviation of variable \(x\) or \(y\), respectively.

You might notice, that this function includes the scaled versions of our lists. So we can rewrite the function to


\[r(x,y) = \frac{1}{N} \sum_{i=1} ^N x_{standardized} y_{standardized} \]

So, let’s finally put it all together into one nice function.

Write a function called calc_correlation that takes two lists as input and returns the correlation between them. Again you can solve this with any type of loop or comprehension you want. If you feel fancy, you can also implement this function using different control flow operations.

Again, here are some steps to follow along:

  • First, check if the input lists have the same length. If not, the function should return None.

  • Initialize your correlation variable with a value of 0

  • Get the length of the lists and store it in a variable N

  • Standardize each list (use your function from above)

  • Loop over the range of N, multiply the elements of both lists at each index and add them to your correlation variable

  • divide your correlation variable by \(N\)

  • return the correlation

In case you want to try something: See what happens if you use for x_elem, y_elem in zip(x, y): to start your loop. Can you use this to calculate your correlation?

# Please write your solution here

Amazing! You just wrote your very own correlation function. I hope this illustrates what functions are all about and how easy they are to use once we loose our reservations.

via GIPHY

5. Outro/Q&A#

And that concludes our Introdocution to Python, thanks for keeping up! Awesome work, folks!

via GIPHY


Please remember to save the notebook as described in the beginning and send it via e-mail as usual. If you have questions, encountered problems and/or errors, please use the cell to outline things further. The same holds true for any type of feedback!

Please provide feedback here

Thanks for going through this assignment, we hope you found it useful and informative (and at least somewhat fun, ey?). Feedback on your work can also by provided and is just one e-mail or discord message away!