cdp_backend.utils package#
Subpackages#
Submodules#
cdp_backend.utils.constants_utils module#
- cdp_backend.utils.constants_utils.get_all_class_attr_values(cls: Type) List[Any] [source]#
Get all class attributes of the provided class. Intended to be used to get all constant values of a class.
- Parameters:
- cls: Type
The class to get the class attributes values for.
- Returns:
- class_attr_values: List[Any]:
The class attributes values.
cdp_backend.utils.file_utils module#
- cdp_backend.utils.file_utils.append_to_stem(path: Path, addition: str) Path [source]#
Rename a file with a string appended to the path stem.
- Parameters:
- path: Path
The path to alter
- addition: str
The string to be appended to the path stem
- Returns:
- path: Path
The new path with the stem addition
- cdp_backend.utils.file_utils.clip_and_reformat_video(video_filepath: Path, start_time: str | None, end_time: str | None, output_path: Path = None, output_format: str = 'mp4') Path [source]#
Clip a video file to a specific time range and convert to requested output format.
- Parameters:
- video_filepath: Path
The filepath of the video to clip.
- start_time: str
The start time of the clip in HH:MM:SS.
- end_time: str
The end time of the clip in HH:MM:SS.
- output_path: Path
The output path to place the clip at.
- output_format: str
The output format. Default: “mp4”
- Returns:
- Path:
The path where the new file was stored to.
- cdp_backend.utils.file_utils.convert_video_to_mp4(video_filepath: Path, start_time: str | None, end_time: str | None, output_path: Path = None) Path [source]#
Converts a video to an equivalent MP4 file.
- Parameters:
- video_filepath: str
The filepath of the video to convert.
- start_time: str
The start time to trim the video in HH:MM:SS.
- end_time: str
The end time to trim the video in HH:MM:SS.
- output_path: Path
The output path to place the clip at.
- Returns:
- output_path: str
The filepath of the converted MP4 video.
- cdp_backend.utils.file_utils.download_video_from_session_id(credentials_file: str, session_id: str, dest: str | Path | None = None) str | Path [source]#
Using the session_id provided, pulls the associated video, and places it the destination.
- Parameters:
- credentials_file: str
The path to the Google Service Account credentials JSON file used to initialize the file store connection.
- session_id: str
The id of the session to retrive the video for.
- dest: Optional[Union[str, Path]]
A destination to store the file to. This is passed directly to the resource_copy function.
- Returns:
- Path
The destination path.
See also
cdp_backend.utils.file_utils.resource_copy
The function that downloads the video from remote host.
- cdp_backend.utils.file_utils.find_proper_resize_ratio(height: int, width: int) float [source]#
Return the proper ratio to resize a thumbnail greater than 960 x 540 pixels.
- Parameters:
- height: int
The height, in pixels, of the thumbnail to be resized.
- width: int
The width, in pixels, of the thumbnail to be resized.
- Returns:
- final_ratio: float
The ratio by which the thumbnail will be resized. If the ratio is less than 1, the thumbnail is too large and should be resized by a factor of final_ratio. If the ratio is greater than or equal to 1, the thumbnail is not too large and should not be resized.
- cdp_backend.utils.file_utils.generate_file_storage_name(file_uri: str, suffix: str) str [source]#
Generate a filename using the hash of the file contents and some provided suffix.
- Parameters:
- file_uri: str
The URI to the file to hash.
- suffix: str
The suffix to append to the hash as a part of the filename.
- Returns:
- dst: str
The name of the file as it should be on Google Cloud Storage.
- cdp_backend.utils.file_utils.get_hover_thumbnail(video_path: str, session_content_hash: str, num_frames: int = 10, duration: float = 6.0) str [source]#
Produce a gif hover thumbnail from an mp4 video file.
- Parameters:
- video_path: str
The URL of the video from which the thumbnail will be produced
- session_content_hash: str
The video content hash. This will be used in the produced image file’s name
- num_frames: int
Determines the number of frames in the thumbnail
- duration: float
Runtime of the produced GIF. Default: 6.0 seconds
- Returns:
- str: cover_name
The name of the thumbnail file: Always session_content_hash + “-hover-thumbnail.png”
- cdp_backend.utils.file_utils.get_media_type(uri: str) str | None [source]#
Get the IANA media type for the provided URI. If one could not be found, return None.
- Parameters:
- uri: str
The URI to get the IANA media type for.
- Returns:
- mtype: Optional[str]:
The found matching IANA media type.
- cdp_backend.utils.file_utils.get_static_thumbnail(video_path: str, session_content_hash: str, seconds: int = 30) str [source]#
A function that produces a png thumbnail image from a video file.
- Parameters:
- video_path: str
The URL of the video from which the thumbnail will be produced
- session_content_hash: str
The video content hash. This will be used in the produced image file’s name
- seconds: int
Determines after how many seconds a frame will be selected to produce the thumbnail. The default is 30 seconds
- Returns:
- str: cover_name
The name of the thumbnail file: Always session_content_hash + “-static-thumbnail.png”
- cdp_backend.utils.file_utils.hash_file_contents(uri: str, buffer_size: int = 65536) str [source]#
Return the SHA256 hash of a file’s content.
- Parameters:
- uri: str
The uri for the file to hash.
- buffer_size: int
The number of bytes to read at a time. Default: 2^16 (64KB)
- Returns:
- hash: str
The SHA256 hash for the file contents.
- cdp_backend.utils.file_utils.parse_doc_file(document_raw: bytes) str [source]#
Extract text from a .doc matter file.
- Parameters:
- document_raw: bytes
The raw document.
- Returns:
- str:
A str of all text in the .doc file.
- cdp_backend.utils.file_utils.parse_document(document_uri: str) str [source]#
Extract text from a .doc, .docx, or .ppt matter file.
- Parameters:
- document_uri: str
The matter file uri.
- Returns:
- str:
A string of all text in the matter file.
- cdp_backend.utils.file_utils.parse_docx_file(zip_archive_bytes: bytes) str [source]#
Extract text from a .docx matter file.
- Parameters:
- zip_archive_bytes: bytes
The raw document to be parsed. Word docx files are zip archives.
- Returns:
- str:
A str of all text in the .docx file.
- cdp_backend.utils.file_utils.parse_pdf_file(document_raw: bytes) str [source]#
Extract text from a .pdf matter file.
- Parameters:
- document_raw: bytes
The raw document.
- Returns:
- str:
A str of all text in the .pdf file.
- cdp_backend.utils.file_utils.parse_pptx_file(document_raw: bytes) str [source]#
Extract text from a .pdf matter file.
- Parameters:
- document_raw: bytes
The raw document.
- Returns:
- str:
A str of all text in the .pdf file.
- cdp_backend.utils.file_utils.remove_duplicate_space(parsed_text: str) str [source]#
Remove all duplicate whitespace characters and replace with a single space.
- Parameters:
- parsed_text: str
The parsed text from the document.
- Returns:
- str:
A string with no more than one consecutive space.
- cdp_backend.utils.file_utils.rename_append_to_stem(path: Path, addition: str) Path [source]#
Rename a file with a string appended to the path stem.
- Parameters:
- path: Path
The path to be renamed
- addition: str
The string to be appended to the path stem
- Returns:
- path: Path
The new path of the renamed file
- cdp_backend.utils.file_utils.rename_with_stem(path: Path, stem: str) Path [source]#
Rename a file with a string appended to the path stem.
- Parameters:
- path: Path
The path to be renamed
- stem: str
The string to become the new stem
- Returns:
- path: Path
The new path of the renamed file
- cdp_backend.utils.file_utils.resource_copy(uri: str, dst: str | Path | None = None, copy_suffix: bool = False, overwrite: bool = False) str [source]#
Copy a resource (local or remote) to a local destination on the machine.
- Parameters:
- uri: str
The uri for the resource to copy.
- dst: Optional[Union[str, Path]]
A specific destination to where the copy should be placed. If None provided stores the resource in the current working directory.
- copy_suffix: bool
Whether to copy the file suffix or not. Default: False (do not copy with suffix)
- overwrite: bool
Boolean value indicating whether or not to overwrite a local resource with the same name if it already exists.
- Returns:
- saved_path: str
The path of where the resource ended up getting copied to.
- cdp_backend.utils.file_utils.should_copy_video(video_filepath: Path, output_format: str = 'mp4') bool [source]#
Check if the video should be copied using ffmpeg StreamCopy codec or if it should be re-encoded as h264.
A video will be copied iff the following conditions are met: - The video at video_filepath has a .mp4 extension - The desired output format is mp4 - The video at video_filepath has a video stream with a codec of h264
- Parameters:
- video_filepath: Path
The filepath of the video under scrutiny.
- output_format: str
The desired output format of the video at video_filepath.
- Returns:
- bool:
True if the video should be copied, False if it should be re-encoded.
- cdp_backend.utils.file_utils.split_audio(video_read_path: str, audio_save_path: str, overwrite: bool = False) tuple[str, str, str] [source]#
Split and store the audio from a video file using ffmpeg.
- Parameters:
- video_read_path: str
Path to the video to split the audio from.
- audio_save_path: str
Path to where the audio should be stored.
- overwrite: bool
Whether to overwrite existing files or not. Default: False (do not overwrite)
- Returns:
- resolved_audio_save_path: str
Path to where the split audio file was saved.
- ffmpeg_stdout_path: str
Path to the ffmpeg stdout log file.
- ffmpeg stderr path: str
Path to the ffmpeg stderr log file.
- cdp_backend.utils.file_utils.vimeo_copy(uri: str, dst: Path, overwrite: bool = False) str [source]#
Copy a video from Vimeo to a local destination on the machine for analysis.
- Parameters:
- uri: str
The url of the Vimeo video to copy.
- dst: str
The location of the file to download.
- overwrite: bool
Boolean value indicating whether or not to overwrite a local video with the same name if it already exists.
- Returns:
- dst: str
The location of the downloaded file.
- cdp_backend.utils.file_utils.with_stem(path: Path, stem: str) Path [source]#
Create a path with a new stem.
- Parameters:
- path: Path
The path to alter
- stem: str
The string to be the new stem of the path
- Returns:
- path: Path
The new path with the replaced stem
- cdp_backend.utils.file_utils.youtube_copy(uri: str, dst: Path, overwrite: bool = False) str [source]#
Copy a video from YouTube to a local destination on the machine.
- Parameters:
- uri: str
The url of the YouTube video to copy.
- dst: str
The location of the file to download.
- overwrite: bool
Boolean value indicating whether or not to overwrite a local video with the same name if it already exists.
- Returns:
- dst: str
The location of the downloaded file.
cdp_backend.utils.string_utils module#
- cdp_backend.utils.string_utils.clean_text(text: str, clean_stop_words: bool = False, clean_emojis: bool = False) str [source]#
Clean text of common characters and extra formatting.
- Parameters:
- text: str
The raw text to clean.
- clean_stop_words: bool
Should English stop words be removed from the raw text or not. Default: False (do not remove stop words)
- clean_emojis: bool
Should emojis, emoticons, pictograms, and other characters be removed. Default: False (do not remove pictograms)
- Returns:
- cleaned_text: str
The cleaned text.
- cdp_backend.utils.string_utils.convert_gcs_json_url_to_gsutil_form(url: str) str [source]#
Convert a GCS JSON API url to its corresponding gsutil uri.
- Parameters:
- url: str
The url in GCS JSON API form.
- Returns:
- gsutil_url: str
The url in gsutil form. Returns empty string if the input url doesn’t match the form.
- cdp_backend.utils.string_utils.remove_emojis(text: str) str [source]#
Minor changes made from this answer on stackoverflow: https://stackoverflow.com/a/58356570.
Module contents#
Utilities package for cdp_backend.