cdp_backend.utils package#

Subpackages#

cdp_backend.utils.resources package
- Module contents

Submodules#

cdp_backend.utils.constants_utils module#

cdp_backend.utils.constants_utils.get_all_class_attr_values(cls: Type) → List[Any][source]#

Get all class attributes of the provided class. Intended to be used to get all constant values of a class.

Parameters:

cls: Type: The class to get the class attributes values for.

Returns:

class_attr_values: List[Any]:: The class attributes values.

cdp_backend.utils.file_utils module#

cdp_backend.utils.file_utils.append_to_stem(path: Path, addition: str) → Path[source]#

Rename a file with a string appended to the path stem.

Parameters:

path: Path: The path to alter
addition: str: The string to be appended to the path stem

Returns:

path: Path: The new path with the stem addition

cdp_backend.utils.file_utils.clip_and_reformat_video(video_filepath: Path, start_time: str | None, end_time: str | None, output_path: Path = None, output_format: str = 'mp4') → Path[source]#

Clip a video file to a specific time range and convert to requested output format.

Parameters:

video_filepath: Path: The filepath of the video to clip.
start_time: str: The start time of the clip in HH:MM:SS.
end_time: str: The end time of the clip in HH:MM:SS.
output_path: Path: The output path to place the clip at.
output_format: str: The output format. Default: “mp4”

Returns:

Path:: The path where the new file was stored to.

cdp_backend.utils.file_utils.convert_video_to_mp4(video_filepath: Path, start_time: str | None, end_time: str | None, output_path: Path = None) → Path[source]#

Converts a video to an equivalent MP4 file.

Parameters:

video_filepath: str: The filepath of the video to convert.
start_time: str: The start time to trim the video in HH:MM:SS.
end_time: str: The end time to trim the video in HH:MM:SS.
output_path: Path: The output path to place the clip at.

Returns:

output_path: str: The filepath of the converted MP4 video.

cdp_backend.utils.file_utils.download_video_from_session_id(credentials_file: str, session_id: str, dest: str | Path | None = None) → str | Path[source]#

Using the session_id provided, pulls the associated video, and places it the destination.

Parameters:

credentials_file: str: The path to the Google Service Account credentials JSON file used to initialize the file store connection.
session_id: str: The id of the session to retrive the video for.
dest: Optional[Union[str, Path]]: A destination to store the file to. This is passed directly to the resource_copy function.

Returns:

Path: The destination path.

See also

cdp_backend.utils.file_utils.resource_copy: The function that downloads the video from remote host.

cdp_backend.utils.file_utils.find_proper_resize_ratio(height: int, width: int) → float[source]#

Return the proper ratio to resize a thumbnail greater than 960 x 540 pixels.

Parameters:

height: int: The height, in pixels, of the thumbnail to be resized.
width: int: The width, in pixels, of the thumbnail to be resized.

Returns:

final_ratio: float: The ratio by which the thumbnail will be resized. If the ratio is less than 1, the thumbnail is too large and should be resized by a factor of final_ratio. If the ratio is greater than or equal to 1, the thumbnail is not too large and should not be resized.

cdp_backend.utils.file_utils.generate_file_storage_name(file_uri: str, suffix: str) → str[source]#

Generate a filename using the hash of the file contents and some provided suffix.

Parameters:

file_uri: str: The URI to the file to hash.
suffix: str: The suffix to append to the hash as a part of the filename.

Returns:

dst: str: The name of the file as it should be on Google Cloud Storage.

cdp_backend.utils.file_utils.get_hover_thumbnail(video_path: str, session_content_hash: str, num_frames: int = 10, duration: float = 6.0) → str[source]#

Produce a gif hover thumbnail from an mp4 video file.

Parameters:

video_path: str: The URL of the video from which the thumbnail will be produced
session_content_hash: str: The video content hash. This will be used in the produced image file’s name
num_frames: int: Determines the number of frames in the thumbnail
duration: float: Runtime of the produced GIF. Default: 6.0 seconds

Returns:

str: cover_name: The name of the thumbnail file: Always session_content_hash + “-hover-thumbnail.png”

cdp_backend.utils.file_utils.get_media_type(uri: str) → str | None[source]#

Get the IANA media type for the provided URI. If one could not be found, return None.

Parameters:

uri: str: The URI to get the IANA media type for.

Returns:

mtype: Optional[str]:: The found matching IANA media type.

cdp_backend.utils.file_utils.get_static_thumbnail(video_path: str, session_content_hash: str, seconds: int = 30) → str[source]#

A function that produces a png thumbnail image from a video file.

Parameters:

video_path: str: The URL of the video from which the thumbnail will be produced
session_content_hash: str: The video content hash. This will be used in the produced image file’s name
seconds: int: Determines after how many seconds a frame will be selected to produce the thumbnail. The default is 30 seconds

Returns:

str: cover_name: The name of the thumbnail file: Always session_content_hash + “-static-thumbnail.png”

cdp_backend.utils.file_utils.hash_file_contents(uri: str, buffer_size: int = 65536) → str[source]#

Return the SHA256 hash of a file’s content.

Parameters:

uri: str: The uri for the file to hash.
buffer_size: int: The number of bytes to read at a time. Default: 2^16 (64KB)

Returns:

hash: str: The SHA256 hash for the file contents.

cdp_backend.utils.file_utils.parse_doc_file(document_raw: bytes) → str[source]#

Extract text from a .doc matter file.

Parameters:

document_raw: bytes: The raw document.

Returns:

str:: A str of all text in the .doc file.

cdp_backend.utils.file_utils.parse_document(document_uri: str) → str[source]#

Extract text from a .doc, .docx, or .ppt matter file.

Parameters:

document_uri: str: The matter file uri.

Returns:

str:: A string of all text in the matter file.

cdp_backend.utils.file_utils.parse_docx_file(zip_archive_bytes: bytes) → str[source]#

Extract text from a .docx matter file.

Parameters:

zip_archive_bytes: bytes: The raw document to be parsed. Word docx files are zip archives.

Returns:

str:: A str of all text in the .docx file.

cdp_backend.utils.file_utils.parse_pdf_file(document_raw: bytes) → str[source]#

Extract text from a .pdf matter file.

Parameters:

document_raw: bytes: The raw document.

Returns:

str:: A str of all text in the .pdf file.

cdp_backend.utils.file_utils.parse_pptx_file(document_raw: bytes) → str[source]#

Extract text from a .pdf matter file.

Parameters:

document_raw: bytes: The raw document.

Returns:

str:: A str of all text in the .pdf file.

cdp_backend.utils.file_utils.remove_duplicate_space(parsed_text: str) → str[source]#

Remove all duplicate whitespace characters and replace with a single space.

Parameters:

parsed_text: str: The parsed text from the document.

Returns:

str:: A string with no more than one consecutive space.

cdp_backend.utils.file_utils.rename_append_to_stem(path: Path, addition: str) → Path[source]#

Rename a file with a string appended to the path stem.

Parameters:

path: Path: The path to be renamed
addition: str: The string to be appended to the path stem

Returns:

path: Path: The new path of the renamed file

cdp_backend.utils.file_utils.rename_with_stem(path: Path, stem: str) → Path[source]#

Rename a file with a string appended to the path stem.

Parameters:

path: Path: The path to be renamed
stem: str: The string to become the new stem

Returns:

path: Path: The new path of the renamed file

cdp_backend.utils.file_utils.resource_copy(uri: str, dst: str | Path | None = None, copy_suffix: bool = False, overwrite: bool = False) → str[source]#

Copy a resource (local or remote) to a local destination on the machine.

Parameters:

uri: str: The uri for the resource to copy.
dst: Optional[Union[str, Path]]: A specific destination to where the copy should be placed. If None provided stores the resource in the current working directory.
copy_suffix: bool: Whether to copy the file suffix or not. Default: False (do not copy with suffix)
overwrite: bool: Boolean value indicating whether or not to overwrite a local resource with the same name if it already exists.

Returns:

saved_path: str: The path of where the resource ended up getting copied to.

cdp_backend.utils.file_utils.should_copy_video(video_filepath: Path, output_format: str = 'mp4') → bool[source]#

Check if the video should be copied using ffmpeg StreamCopy codec or if it should be re-encoded as h264.

A video will be copied iff the following conditions are met: - The video at video_filepath has a .mp4 extension - The desired output format is mp4 - The video at video_filepath has a video stream with a codec of h264

Parameters:

video_filepath: Path: The filepath of the video under scrutiny.
output_format: str: The desired output format of the video at video_filepath.

Returns:

bool:: True if the video should be copied, False if it should be re-encoded.

cdp_backend.utils.file_utils.split_audio(video_read_path: str, audio_save_path: str, overwrite: bool = False) → tuple[str, str, str][source]#

Split and store the audio from a video file using ffmpeg.

Parameters:

video_read_path: str: Path to the video to split the audio from.
audio_save_path: str: Path to where the audio should be stored.
overwrite: bool: Whether to overwrite existing files or not. Default: False (do not overwrite)

Returns:

resolved_audio_save_path: str: Path to where the split audio file was saved.
ffmpeg_stdout_path: str: Path to the ffmpeg stdout log file.
ffmpeg stderr path: str: Path to the ffmpeg stderr log file.

cdp_backend.utils.file_utils.vimeo_copy(uri: str, dst: Path, overwrite: bool = False) → str[source]#

Copy a video from Vimeo to a local destination on the machine for analysis.

Parameters:

uri: str: The url of the Vimeo video to copy.
dst: str: The location of the file to download.
overwrite: bool: Boolean value indicating whether or not to overwrite a local video with the same name if it already exists.

Returns:

dst: str: The location of the downloaded file.

cdp_backend.utils.file_utils.with_stem(path: Path, stem: str) → Path[source]#

Create a path with a new stem.

Parameters:

path: Path: The path to alter
stem: str: The string to be the new stem of the path

Returns:

path: Path: The new path with the replaced stem

cdp_backend.utils.file_utils.youtube_copy(uri: str, dst: Path, overwrite: bool = False) → str[source]#

Copy a video from YouTube to a local destination on the machine.

Parameters:

uri: str: The url of the YouTube video to copy.
dst: str: The location of the file to download.
overwrite: bool: Boolean value indicating whether or not to overwrite a local video with the same name if it already exists.

Returns:

dst: str: The location of the downloaded file.

cdp_backend.utils.string_utils module#

cdp_backend.utils.string_utils.clean_text(text: str, clean_stop_words: bool = False, clean_emojis: bool = False) → str[source]#

Clean text of common characters and extra formatting.

Parameters:

text: str: The raw text to clean.
clean_stop_words: bool: Should English stop words be removed from the raw text or not. Default: False (do not remove stop words)
clean_emojis: bool: Should emojis, emoticons, pictograms, and other characters be removed. Default: False (do not remove pictograms)

Returns:

cleaned_text: str: The cleaned text.

cdp_backend.utils.string_utils.convert_gcs_json_url_to_gsutil_form(url: str) → str[source]#

Convert a GCS JSON API url to its corresponding gsutil uri.

Parameters:

url: str: The url in GCS JSON API form.

Returns:

gsutil_url: str: The url in gsutil form. Returns empty string if the input url doesn’t match the form.

cdp_backend.utils.string_utils.remove_emojis(text: str) → str[source]#: Minor changes made from this answer on stackoverflow: https://stackoverflow.com/a/58356570.

Module contents#

Utilities package for cdp_backend.