macro_eeg_model.data_prep.data_preparator#
Classes#
A class to prepare and process data from directories containing CSV files with |
Module Contents#
- class macro_eeg_model.data_prep.data_preparator.DataPreparator[source]#
A class to prepare and process data from directories containing CSV files with connectivity data across subjects. The processed data is saved as a NumPy array after averaging across multiple subjects.
- prep_and_save(directory_name, included_word, delimiter, name)[source]#
Handles the prerequisites for preparing and saving the data from a specified directory within the Julich data path (see
src.utils.paths.Paths) and then does the actual data preparation and saving using_prep_and_save_data().This method filters the files in the directory based on an included word in their filenames, processes them into NumPy arrays, calculates an average array, and saves it to a specified path.
- Parameters:
directory_name (str) – The name of the directory containing the subject folders.
included_word (str) – The word that should be included in the CSV filenames to be processed.
delimiter (str) – The delimiter used in the CSV files.
name (str) – The name to use when saving the final averaged array.
- _prep_and_save_data(directory, subjects, included_word, delimiter, name)[source]#
Extracts relevant CSV files based on the included word using
_extract_csv_files()converts them to NumPy arrays using_get_arrays_from_files(), computes an average array using_calculate_avg_array(), and saves it as a .npy file.- Parameters:
directory (str or pathlib.Path) – The path to the directory containing the subject folders.
subjects (list) – The list of subject folder names.
included_word (str) – The word that should be included in the CSV filenames to be processed.
delimiter (str) – The delimiter used in the CSV files.
name (str) – The name to use when saving the final averaged array.
- static _extract_csv_files(directory, subjects, included_word)[source]#
Extracts the names of CSV files that include a specific word in their filenames. Searches through the directory of each subject for CSV files that contain the specified word in their name.
- Parameters:
directory (str or pathlib.Path) – The path to the directory containing the subject folders.
subjects (list) – The list of subject folder names.
included_word (str) – The word that must be included in the filenames.
- Returns:
A list of filenames that match the criteria.
- Return type:
list
- _get_arrays_from_files(directory, subjects, files, delimiter=',')[source]#
Retrieves and converts the relevant CSV files into NumPy arrays using
_convert_csv_file_to_numpy_array().For each subject in the directory, this method identifies the files to be processed, converts them into NumPy arrays, and collects them for further processing.
- Parameters:
directory (str or pathlib.Path) – The path to the directory containing the subject folders.
subjects (list) – The list of subject folder names.
files (list) – The list of filenames to be processed.
delimiter (str, optional) – The delimiter used in the CSV files (default is ‘,’).
- Returns:
A list of NumPy arrays corresponding to the processed CSV files.
- Return type:
list
- static _convert_csv_file_to_numpy_array(file_path, delimiter)[source]#
Converts a CSV file into a NumPy array.
- Parameters:
file_path (str or pathlib.Path) – The full path to the CSV file.
delimiter (str) – The delimiter used in the CSV file.
- Returns:
A NumPy array representing the data from the CSV file.
- Return type:
numpy.ndarray
- static _calculate_avg_array(numpy_arrays)[source]#
Computes the average of each element across multiple NumPy arrays, excluding the highest and lowest 20% of values (to reduce the impact of outliers), and returns the resulting array.
- Parameters:
numpy_arrays (list) – A list of NumPy arrays to average.
- Returns:
A NumPy array containing the average values.
- Return type:
numpy.ndarray