dsa_tdb package
Submodules
- dsa_tdb.cli module
- dsa_tdb.core module
- dsa_tdb.etl module
- dsa_tdb.fetch module
- dsa_tdb.types module
ALL_PLATFORMS_ENTRY_VALUE
ALL_PLATFORMS_PLATFORM_NAME
AccountType
AggregateFileFormat
AggregateWriteMode
AggregationConfig
AggregationConfig.Config
AggregationConfig.columns_datetime
AggregationConfig.columns_to_fill_bool
AggregationConfig.columns_to_fill_str
AggregationConfig.columns_to_group
AggregationConfig.columns_to_import
AggregationConfig.compute_restriction_duration
AggregationConfig.compute_time_to_action
AggregationConfig.content_date_range
AggregationConfig.created_at_date_range
AggregationConfig.created_at_dt_floor
AggregationConfig.decision_date_range
AggregationConfig.delete_original_columns
AggregationConfig.fillna_bool_value
AggregationConfig.fillna_str_value
AggregationConfig.horizontally_explode_columns
AggregationConfig.input_format
AggregationConfig.model_config
AggregationConfig.normalize_content_type_other
AggregationConfig.normalize_platform_name
AggregationConfig.output_format
AggregationConfig.platforms_to_exclude
AggregationConfig.write_mode
AutomatedDecision
AutomatedDetection
BooleanOperator
CELERY_TASK_QUEUE
CHUNKED_FILES_SUBFOLDER_NAME
CHUNKED_FILE_SUCCESS_NAME
Category
Category.STATEMENT_CATEGORY_ANIMAL_WELFARE
Category.STATEMENT_CATEGORY_DATA_PROTECTION_AND_PRIVACY_VIOLATIONS
Category.STATEMENT_CATEGORY_ILLEGAL_OR_HARMFUL_SPEECH
Category.STATEMENT_CATEGORY_INTELLECTUAL_PROPERTY_INFRINGEMENTS
Category.STATEMENT_CATEGORY_NEGATIVE_EFFECTS_ON_CIVIC_DISCOURSE_OR_ELECTIONS
Category.STATEMENT_CATEGORY_NON_CONSENSUAL_BEHAVIOUR
Category.STATEMENT_CATEGORY_PORNOGRAPHY_OR_SEXUALIZED_CONTENT
Category.STATEMENT_CATEGORY_PROTECTION_OF_MINORS
Category.STATEMENT_CATEGORY_RISK_FOR_PUBLIC_SECURITY
Category.STATEMENT_CATEGORY_SCAMS_AND_FRAUD
Category.STATEMENT_CATEGORY_SCOPE_OF_PLATFORM_SERVICE
Category.STATEMENT_CATEGORY_SELF_HARM
Category.STATEMENT_CATEGORY_UNSAFE_AND_ILLEGAL_PRODUCTS
Category.STATEMENT_CATEGORY_VIOLENCE
CategoryAddition
CategorySpecification
ContentLanguage
ContentLanguage.BG
ContentLanguage.CS
ContentLanguage.DA
ContentLanguage.DE
ContentLanguage.EL
ContentLanguage.EN
ContentLanguage.ES
ContentLanguage.ET
ContentLanguage.FI
ContentLanguage.FR
ContentLanguage.GA
ContentLanguage.HR
ContentLanguage.HU
ContentLanguage.IT
ContentLanguage.LT
ContentLanguage.LV
ContentLanguage.MT
ContentLanguage.NL
ContentLanguage.PL
ContentLanguage.PT
ContentLanguage.RO
ContentLanguage.SK
ContentLanguage.SL
ContentLanguage.SV
ContentType
ContentType.CONTENT_TYPE_ACCOUNT
ContentType.CONTENT_TYPE_AD
ContentType.CONTENT_TYPE_APP
ContentType.CONTENT_TYPE_AUDIO
ContentType.CONTENT_TYPE_HASHTAG
ContentType.CONTENT_TYPE_IMAGE
ContentType.CONTENT_TYPE_LINK
ContentType.CONTENT_TYPE_OTHER
ContentType.CONTENT_TYPE_PRODUCT
ContentType.CONTENT_TYPE_STICKER
ContentType.CONTENT_TYPE_SYNTHETIC_MEDIA
ContentType.CONTENT_TYPE_TEXT
ContentType.CONTENT_TYPE_VIDEO
DAILY_FILES_SUBFOLDER_REGEX
DAILY_FILES_SUBFOLDER_TEMPLATE
DAILY_FILES_TABLE_URL
DAILY_FILE_CHECKSUM_EXTENSION
DAILY_FILE_DATE_FORMAT
DAILY_FILE_EXTENSION
DAILY_FILE_NAME_REGEX
DAILY_FILE_NAME_TEMPLATE
DAILY_FILE_URL_TEMPLATE
DatetimeColumns
DecisionAccount
DecisionGround
DecisionMonetary
DecisionProvision
DecisionVisibility
DecisionVisibility.DECISION_VISIBILITY_CONTENT_AGE_RESTRICTED
DecisionVisibility.DECISION_VISIBILITY_CONTENT_DEMOTED
DecisionVisibility.DECISION_VISIBILITY_CONTENT_DISABLED
DecisionVisibility.DECISION_VISIBILITY_CONTENT_INTERACTION_RESTRICTED
DecisionVisibility.DECISION_VISIBILITY_CONTENT_LABELLED
DecisionVisibility.DECISION_VISIBILITY_CONTENT_REMOVED
DecisionVisibility.DECISION_VISIBILITY_OTHER
EXPLODED_COLUMNS
EXPLODED_COLUMNS.CONTENT_TYPE_ACCOUNT
EXPLODED_COLUMNS.CONTENT_TYPE_AD
EXPLODED_COLUMNS.CONTENT_TYPE_APP
EXPLODED_COLUMNS.CONTENT_TYPE_AUDIO
EXPLODED_COLUMNS.CONTENT_TYPE_HASHTAG
EXPLODED_COLUMNS.CONTENT_TYPE_IMAGE
EXPLODED_COLUMNS.CONTENT_TYPE_LINK
EXPLODED_COLUMNS.CONTENT_TYPE_OTHER
EXPLODED_COLUMNS.CONTENT_TYPE_PRODUCT
EXPLODED_COLUMNS.CONTENT_TYPE_STICKER
EXPLODED_COLUMNS.CONTENT_TYPE_SYNTHETIC_MEDIA
EXPLODED_COLUMNS.CONTENT_TYPE_TEXT
EXPLODED_COLUMNS.CONTENT_TYPE_VIDEO
EXPLODED_COLUMNS.DECISION_VISIBILITY_CONTENT_AGE_RESTRICTED
EXPLODED_COLUMNS.DECISION_VISIBILITY_CONTENT_DEMOTED
EXPLODED_COLUMNS.DECISION_VISIBILITY_CONTENT_DISABLED
EXPLODED_COLUMNS.DECISION_VISIBILITY_CONTENT_INTERACTION_RESTRICTED
EXPLODED_COLUMNS.DECISION_VISIBILITY_CONTENT_LABELLED
EXPLODED_COLUMNS.DECISION_VISIBILITY_CONTENT_REMOVED
EXPLODED_COLUMNS.DECISION_VISIBILITY_OTHER
EXPLODED_COLUMNS.KEYWORD_ADULT_SEXUAL_MATERIAL
EXPLODED_COLUMNS.KEYWORD_AGE_SPECIFIC_RESTRICTIONS
EXPLODED_COLUMNS.KEYWORD_AGE_SPECIFIC_RESTRICTIONS_MINORS
EXPLODED_COLUMNS.KEYWORD_ANIMAL_HARM
EXPLODED_COLUMNS.KEYWORD_BIOMETRIC_DATA_BREACH
EXPLODED_COLUMNS.KEYWORD_CHILD_SEXUAL_ABUSE_MATERIAL
EXPLODED_COLUMNS.KEYWORD_CONTENT_PROMOTING_EATING_DISORDERS
EXPLODED_COLUMNS.KEYWORD_COORDINATED_HARM
EXPLODED_COLUMNS.KEYWORD_COPYRIGHT_INFRINGEMENT
EXPLODED_COLUMNS.KEYWORD_DANGEROUS_TOYS
EXPLODED_COLUMNS.KEYWORD_DATA_FALSIFICATION
EXPLODED_COLUMNS.KEYWORD_DEFAMATION
EXPLODED_COLUMNS.KEYWORD_DESIGN_INFRINGEMENT
EXPLODED_COLUMNS.KEYWORD_DISCRIMINATION
EXPLODED_COLUMNS.KEYWORD_DISINFORMATION
EXPLODED_COLUMNS.KEYWORD_FOREIGN_INFORMATION_MANIPULATION
EXPLODED_COLUMNS.KEYWORD_GENDER_BASED_VIOLENCE
EXPLODED_COLUMNS.KEYWORD_GEOGRAPHICAL_REQUIREMENTS
EXPLODED_COLUMNS.KEYWORD_GEOGRAPHIC_INDICATIONS_INFRINGEMENT
EXPLODED_COLUMNS.KEYWORD_GOODS_SERVICES_NOT_PERMITTED
EXPLODED_COLUMNS.KEYWORD_GROOMING_SEXUAL_ENTICEMENT_MINORS
EXPLODED_COLUMNS.KEYWORD_HATE_SPEECH
EXPLODED_COLUMNS.KEYWORD_HUMAN_EXPLOITATION
EXPLODED_COLUMNS.KEYWORD_HUMAN_TRAFFICKING
EXPLODED_COLUMNS.KEYWORD_ILLEGAL_ORGANIZATIONS
EXPLODED_COLUMNS.KEYWORD_IMAGE_BASED_SEXUAL_ABUSE
EXPLODED_COLUMNS.KEYWORD_IMPERSONATION_ACCOUNT_HIJACKING
EXPLODED_COLUMNS.KEYWORD_INAUTHENTIC_ACCOUNTS
EXPLODED_COLUMNS.KEYWORD_INAUTHENTIC_LISTINGS
EXPLODED_COLUMNS.KEYWORD_INAUTHENTIC_USER_REVIEWS
EXPLODED_COLUMNS.KEYWORD_INCITEMENT_VIOLENCE_HATRED
EXPLODED_COLUMNS.KEYWORD_INSUFFICIENT_INFORMATION_TRADERS
EXPLODED_COLUMNS.KEYWORD_LANGUAGE_REQUIREMENTS
EXPLODED_COLUMNS.KEYWORD_MISINFORMATION
EXPLODED_COLUMNS.KEYWORD_MISSING_PROCESSING_GROUND
EXPLODED_COLUMNS.KEYWORD_NON_CONSENSUAL_IMAGE_SHARING
EXPLODED_COLUMNS.KEYWORD_NON_CONSENSUAL_ITEMS_DEEPFAKE
EXPLODED_COLUMNS.KEYWORD_NUDITY
EXPLODED_COLUMNS.KEYWORD_ONLINE_BULLYING_INTIMIDATION
EXPLODED_COLUMNS.KEYWORD_OTHER
EXPLODED_COLUMNS.KEYWORD_PATENT_INFRINGEMENT
EXPLODED_COLUMNS.KEYWORD_PHISHING
EXPLODED_COLUMNS.KEYWORD_PYRAMID_SCHEMES
EXPLODED_COLUMNS.KEYWORD_REGULATED_GOODS_SERVICES
EXPLODED_COLUMNS.KEYWORD_RIGHT_TO_BE_FORGOTTEN
EXPLODED_COLUMNS.KEYWORD_RISK_ENVIRONMENTAL_DAMAGE
EXPLODED_COLUMNS.KEYWORD_RISK_PUBLIC_HEALTH
EXPLODED_COLUMNS.KEYWORD_SELF_MUTILATION
EXPLODED_COLUMNS.KEYWORD_STALKING
EXPLODED_COLUMNS.KEYWORD_SUICIDE
EXPLODED_COLUMNS.KEYWORD_TERRORIST_CONTENT
EXPLODED_COLUMNS.KEYWORD_TRADEMARK_INFRINGEMENT
EXPLODED_COLUMNS.KEYWORD_TRADE_SECRET_INFRINGEMENT
EXPLODED_COLUMNS.KEYWORD_UNLAWFUL_SALE_ANIMALS
EXPLODED_COLUMNS.KEYWORD_UNSAFE_CHALLENGES
EXPLODED_COLUMNS.STATEMENT_CATEGORY_ANIMAL_WELFARE
EXPLODED_COLUMNS.STATEMENT_CATEGORY_DATA_PROTECTION_AND_PRIVACY_VIOLATIONS
EXPLODED_COLUMNS.STATEMENT_CATEGORY_ILLEGAL_OR_HARMFUL_SPEECH
EXPLODED_COLUMNS.STATEMENT_CATEGORY_INTELLECTUAL_PROPERTY_INFRINGEMENTS
EXPLODED_COLUMNS.STATEMENT_CATEGORY_NEGATIVE_EFFECTS_ON_CIVIC_DISCOURSE_OR_ELECTIONS
EXPLODED_COLUMNS.STATEMENT_CATEGORY_NON_CONSENSUAL_BEHAVIOUR
EXPLODED_COLUMNS.STATEMENT_CATEGORY_PORNOGRAPHY_OR_SEXUALIZED_CONTENT
EXPLODED_COLUMNS.STATEMENT_CATEGORY_PROTECTION_OF_MINORS
EXPLODED_COLUMNS.STATEMENT_CATEGORY_RISK_FOR_PUBLIC_SECURITY
EXPLODED_COLUMNS.STATEMENT_CATEGORY_SCAMS_AND_FRAUD
EXPLODED_COLUMNS.STATEMENT_CATEGORY_SCOPE_OF_PLATFORM_SERVICE
EXPLODED_COLUMNS.STATEMENT_CATEGORY_SELF_HARM
EXPLODED_COLUMNS.STATEMENT_CATEGORY_UNSAFE_AND_ILLEGAL_PRODUCTS
EXPLODED_COLUMNS.STATEMENT_CATEGORY_VIOLENCE
FilteringConfig
FilteringConfig.Config
FilteringConfig.account_type
FilteringConfig.automated_decision
FilteringConfig.automated_detection
FilteringConfig.bool_columns_to_check
FilteringConfig.bool_columns_to_check_operator
FilteringConfig.category
FilteringConfig.category_addition
FilteringConfig.category_specification
FilteringConfig.category_specification_other
FilteringConfig.category_specification_other_to_lower
FilteringConfig.columns_datetime
FilteringConfig.columns_to_fill_bool
FilteringConfig.columns_to_fill_str
FilteringConfig.columns_to_import
FilteringConfig.content_date_range
FilteringConfig.content_language
FilteringConfig.content_type
FilteringConfig.content_type_other
FilteringConfig.content_type_other_to_lower
FilteringConfig.created_at_date_range
FilteringConfig.created_at_dt_floor
FilteringConfig.decision_account
FilteringConfig.decision_date_range
FilteringConfig.decision_facts
FilteringConfig.decision_facts_to_lower
FilteringConfig.decision_ground
FilteringConfig.decision_ground_reference_url
FilteringConfig.decision_ground_reference_url_to_lower
FilteringConfig.decision_monetary
FilteringConfig.decision_monetary_other
FilteringConfig.decision_monetary_other_to_lower
FilteringConfig.decision_provision
FilteringConfig.decision_visibility
FilteringConfig.decision_visibility_other
FilteringConfig.decision_visibility_other_to_lower
FilteringConfig.delete_original_columns
FilteringConfig.downstream_sampling
FilteringConfig.end_date_account_restriction
FilteringConfig.end_date_monetary_restriction
FilteringConfig.end_date_service_restriction
FilteringConfig.end_date_visibility_restriction
FilteringConfig.fillna_bool_value
FilteringConfig.fillna_str_value
FilteringConfig.horizontally_explode_columns
FilteringConfig.illegal_content_explanation
FilteringConfig.illegal_content_explanation_to_lower
FilteringConfig.illegal_content_legal_ground
FilteringConfig.illegal_content_legal_ground_to_lower
FilteringConfig.incompatible_content_explanation
FilteringConfig.incompatible_content_explanation_to_lower
FilteringConfig.incompatible_content_ground
FilteringConfig.incompatible_content_ground_to_lower
FilteringConfig.incompatible_content_illegal
FilteringConfig.input_format
FilteringConfig.model_config
FilteringConfig.normalize_content_type_other
FilteringConfig.normalize_platform_name
FilteringConfig.output_format
FilteringConfig.platforms_to_exclude
FilteringConfig.platforms_to_include
FilteringConfig.source_identity
FilteringConfig.source_identity_to_lower
FilteringConfig.source_type
FilteringConfig.territorial_scope
FilteringConfig.upstream_sampling
FilteringConfig.write_mode
IncompatibleContentIllegal
InputFileFormat
Keyword
Keyword.KEYWORD_ADULT_SEXUAL_MATERIAL
Keyword.KEYWORD_AGE_SPECIFIC_RESTRICTIONS
Keyword.KEYWORD_AGE_SPECIFIC_RESTRICTIONS_MINORS
Keyword.KEYWORD_ANIMAL_HARM
Keyword.KEYWORD_BIOMETRIC_DATA_BREACH
Keyword.KEYWORD_CHILD_SEXUAL_ABUSE_MATERIAL
Keyword.KEYWORD_CONTENT_PROMOTING_EATING_DISORDERS
Keyword.KEYWORD_COORDINATED_HARM
Keyword.KEYWORD_COPYRIGHT_INFRINGEMENT
Keyword.KEYWORD_DANGEROUS_TOYS
Keyword.KEYWORD_DATA_FALSIFICATION
Keyword.KEYWORD_DEFAMATION
Keyword.KEYWORD_DESIGN_INFRINGEMENT
Keyword.KEYWORD_DISCRIMINATION
Keyword.KEYWORD_DISINFORMATION
Keyword.KEYWORD_FOREIGN_INFORMATION_MANIPULATION
Keyword.KEYWORD_GENDER_BASED_VIOLENCE
Keyword.KEYWORD_GEOGRAPHICAL_REQUIREMENTS
Keyword.KEYWORD_GEOGRAPHIC_INDICATIONS_INFRINGEMENT
Keyword.KEYWORD_GOODS_SERVICES_NOT_PERMITTED
Keyword.KEYWORD_GROOMING_SEXUAL_ENTICEMENT_MINORS
Keyword.KEYWORD_HATE_SPEECH
Keyword.KEYWORD_HUMAN_EXPLOITATION
Keyword.KEYWORD_HUMAN_TRAFFICKING
Keyword.KEYWORD_ILLEGAL_ORGANIZATIONS
Keyword.KEYWORD_IMAGE_BASED_SEXUAL_ABUSE
Keyword.KEYWORD_IMPERSONATION_ACCOUNT_HIJACKING
Keyword.KEYWORD_INAUTHENTIC_ACCOUNTS
Keyword.KEYWORD_INAUTHENTIC_LISTINGS
Keyword.KEYWORD_INAUTHENTIC_USER_REVIEWS
Keyword.KEYWORD_INCITEMENT_VIOLENCE_HATRED
Keyword.KEYWORD_INSUFFICIENT_INFORMATION_TRADERS
Keyword.KEYWORD_LANGUAGE_REQUIREMENTS
Keyword.KEYWORD_MISINFORMATION
Keyword.KEYWORD_MISSING_PROCESSING_GROUND
Keyword.KEYWORD_NON_CONSENSUAL_IMAGE_SHARING
Keyword.KEYWORD_NON_CONSENSUAL_ITEMS_DEEPFAKE
Keyword.KEYWORD_NUDITY
Keyword.KEYWORD_ONLINE_BULLYING_INTIMIDATION
Keyword.KEYWORD_OTHER
Keyword.KEYWORD_PATENT_INFRINGEMENT
Keyword.KEYWORD_PHISHING
Keyword.KEYWORD_PYRAMID_SCHEMES
Keyword.KEYWORD_REGULATED_GOODS_SERVICES
Keyword.KEYWORD_RIGHT_TO_BE_FORGOTTEN
Keyword.KEYWORD_RISK_ENVIRONMENTAL_DAMAGE
Keyword.KEYWORD_RISK_PUBLIC_HEALTH
Keyword.KEYWORD_SELF_MUTILATION
Keyword.KEYWORD_STALKING
Keyword.KEYWORD_SUICIDE
Keyword.KEYWORD_TERRORIST_CONTENT
Keyword.KEYWORD_TRADEMARK_INFRINGEMENT
Keyword.KEYWORD_TRADE_SECRET_INFRINGEMENT
Keyword.KEYWORD_UNLAWFUL_SALE_ANIMALS
Keyword.KEYWORD_UNSAFE_CHALLENGES
LoadFileArguments
LoadFileArguments.Config
LoadFileArguments.columns_datetime
LoadFileArguments.columns_to_fill_bool
LoadFileArguments.columns_to_fill_str
LoadFileArguments.columns_to_import
LoadFileArguments.compute_restriction_duration
LoadFileArguments.compute_time_to_action
LoadFileArguments.content_date_range
LoadFileArguments.created_at_date_range
LoadFileArguments.decision_date_range
LoadFileArguments.del_original
LoadFileArguments.dump_files_pattern
LoadFileArguments.explode_cols
LoadFileArguments.fillna_bool
LoadFileArguments.fillna_str
LoadFileArguments.input_format
LoadFileArguments.model_config
LoadFileArguments.normalize_content_type_other
LoadFileArguments.normalize_platform_name
PreprocessArguments
PreprocessArguments.Config
PreprocessArguments.check_sha1
PreprocessArguments.chunk_format
PreprocessArguments.chunk_size
PreprocessArguments.delete_original
PreprocessArguments.do_chunking
PreprocessArguments.dump_files_folder
PreprocessArguments.force_sha1
PreprocessArguments.from_date
PreprocessArguments.loglevel
PreprocessArguments.model_config
PreprocessArguments.n_processes
PreprocessArguments.override_chunked_subfolder
PreprocessArguments.platform
PreprocessArguments.platforms_to_exclude
PreprocessArguments.to_date
PreprocessArguments.version
RawAndExplodedColumn
RawAndExplodedColumn.CONTENT_TYPE_ACCOUNT
RawAndExplodedColumn.CONTENT_TYPE_AD
RawAndExplodedColumn.CONTENT_TYPE_APP
RawAndExplodedColumn.CONTENT_TYPE_AUDIO
RawAndExplodedColumn.CONTENT_TYPE_HASHTAG
RawAndExplodedColumn.CONTENT_TYPE_IMAGE
RawAndExplodedColumn.CONTENT_TYPE_LINK
RawAndExplodedColumn.CONTENT_TYPE_OTHER
RawAndExplodedColumn.CONTENT_TYPE_PRODUCT
RawAndExplodedColumn.CONTENT_TYPE_STICKER
RawAndExplodedColumn.CONTENT_TYPE_SYNTHETIC_MEDIA
RawAndExplodedColumn.CONTENT_TYPE_TEXT
RawAndExplodedColumn.CONTENT_TYPE_VIDEO
RawAndExplodedColumn.DECISION_ACCOUNT_SUSPENDED
RawAndExplodedColumn.DECISION_ACCOUNT_TERMINATED
RawAndExplodedColumn.DECISION_MONETARY_OTHER
RawAndExplodedColumn.DECISION_MONETARY_SUSPENSION
RawAndExplodedColumn.DECISION_MONETARY_TERMINATION
RawAndExplodedColumn.DECISION_PROVISION_PARTIAL_SUSPENSION
RawAndExplodedColumn.DECISION_PROVISION_PARTIAL_TERMINATION
RawAndExplodedColumn.DECISION_PROVISION_TOTAL_SUSPENSION
RawAndExplodedColumn.DECISION_PROVISION_TOTAL_TERMINATION
RawAndExplodedColumn.DECISION_VISIBILITY_CONTENT_AGE_RESTRICTED
RawAndExplodedColumn.DECISION_VISIBILITY_CONTENT_DEMOTED
RawAndExplodedColumn.DECISION_VISIBILITY_CONTENT_DISABLED
RawAndExplodedColumn.DECISION_VISIBILITY_CONTENT_INTERACTION_RESTRICTED
RawAndExplodedColumn.DECISION_VISIBILITY_CONTENT_LABELLED
RawAndExplodedColumn.DECISION_VISIBILITY_CONTENT_REMOVED
RawAndExplodedColumn.DECISION_VISIBILITY_OTHER
RawAndExplodedColumn.KEYWORD_ADULT_SEXUAL_MATERIAL
RawAndExplodedColumn.KEYWORD_AGE_SPECIFIC_RESTRICTIONS
RawAndExplodedColumn.KEYWORD_AGE_SPECIFIC_RESTRICTIONS_MINORS
RawAndExplodedColumn.KEYWORD_ANIMAL_HARM
RawAndExplodedColumn.KEYWORD_BIOMETRIC_DATA_BREACH
RawAndExplodedColumn.KEYWORD_CHILD_SEXUAL_ABUSE_MATERIAL
RawAndExplodedColumn.KEYWORD_CONTENT_PROMOTING_EATING_DISORDERS
RawAndExplodedColumn.KEYWORD_COORDINATED_HARM
RawAndExplodedColumn.KEYWORD_COPYRIGHT_INFRINGEMENT
RawAndExplodedColumn.KEYWORD_DANGEROUS_TOYS
RawAndExplodedColumn.KEYWORD_DATA_FALSIFICATION
RawAndExplodedColumn.KEYWORD_DEFAMATION
RawAndExplodedColumn.KEYWORD_DESIGN_INFRINGEMENT
RawAndExplodedColumn.KEYWORD_DISCRIMINATION
RawAndExplodedColumn.KEYWORD_DISINFORMATION
RawAndExplodedColumn.KEYWORD_FOREIGN_INFORMATION_MANIPULATION
RawAndExplodedColumn.KEYWORD_GENDER_BASED_VIOLENCE
RawAndExplodedColumn.KEYWORD_GEOGRAPHICAL_REQUIREMENTS
RawAndExplodedColumn.KEYWORD_GEOGRAPHIC_INDICATIONS_INFRINGEMENT
RawAndExplodedColumn.KEYWORD_GOODS_SERVICES_NOT_PERMITTED
RawAndExplodedColumn.KEYWORD_GROOMING_SEXUAL_ENTICEMENT_MINORS
RawAndExplodedColumn.KEYWORD_HATE_SPEECH
RawAndExplodedColumn.KEYWORD_HUMAN_EXPLOITATION
RawAndExplodedColumn.KEYWORD_HUMAN_TRAFFICKING
RawAndExplodedColumn.KEYWORD_ILLEGAL_ORGANIZATIONS
RawAndExplodedColumn.KEYWORD_IMAGE_BASED_SEXUAL_ABUSE
RawAndExplodedColumn.KEYWORD_IMPERSONATION_ACCOUNT_HIJACKING
RawAndExplodedColumn.KEYWORD_INAUTHENTIC_ACCOUNTS
RawAndExplodedColumn.KEYWORD_INAUTHENTIC_LISTINGS
RawAndExplodedColumn.KEYWORD_INAUTHENTIC_USER_REVIEWS
RawAndExplodedColumn.KEYWORD_INCITEMENT_VIOLENCE_HATRED
RawAndExplodedColumn.KEYWORD_INSUFFICIENT_INFORMATION_TRADERS
RawAndExplodedColumn.KEYWORD_LANGUAGE_REQUIREMENTS
RawAndExplodedColumn.KEYWORD_MISINFORMATION
RawAndExplodedColumn.KEYWORD_MISSING_PROCESSING_GROUND
RawAndExplodedColumn.KEYWORD_NON_CONSENSUAL_IMAGE_SHARING
RawAndExplodedColumn.KEYWORD_NON_CONSENSUAL_ITEMS_DEEPFAKE
RawAndExplodedColumn.KEYWORD_NUDITY
RawAndExplodedColumn.KEYWORD_ONLINE_BULLYING_INTIMIDATION
RawAndExplodedColumn.KEYWORD_OTHER
RawAndExplodedColumn.KEYWORD_PATENT_INFRINGEMENT
RawAndExplodedColumn.KEYWORD_PHISHING
RawAndExplodedColumn.KEYWORD_PYRAMID_SCHEMES
RawAndExplodedColumn.KEYWORD_REGULATED_GOODS_SERVICES
RawAndExplodedColumn.KEYWORD_RIGHT_TO_BE_FORGOTTEN
RawAndExplodedColumn.KEYWORD_RISK_ENVIRONMENTAL_DAMAGE
RawAndExplodedColumn.KEYWORD_RISK_PUBLIC_HEALTH
RawAndExplodedColumn.KEYWORD_SELF_MUTILATION
RawAndExplodedColumn.KEYWORD_STALKING
RawAndExplodedColumn.KEYWORD_SUICIDE
RawAndExplodedColumn.KEYWORD_TERRORIST_CONTENT
RawAndExplodedColumn.KEYWORD_TRADEMARK_INFRINGEMENT
RawAndExplodedColumn.KEYWORD_TRADE_SECRET_INFRINGEMENT
RawAndExplodedColumn.KEYWORD_UNLAWFUL_SALE_ANIMALS
RawAndExplodedColumn.KEYWORD_UNSAFE_CHALLENGES
RawAndExplodedColumn.STATEMENT_CATEGORY_ANIMAL_WELFARE
RawAndExplodedColumn.STATEMENT_CATEGORY_DATA_PROTECTION_AND_PRIVACY_VIOLATIONS
RawAndExplodedColumn.STATEMENT_CATEGORY_ILLEGAL_OR_HARMFUL_SPEECH
RawAndExplodedColumn.STATEMENT_CATEGORY_INTELLECTUAL_PROPERTY_INFRINGEMENTS
RawAndExplodedColumn.STATEMENT_CATEGORY_NEGATIVE_EFFECTS_ON_CIVIC_DISCOURSE_OR_ELECTIONS
RawAndExplodedColumn.STATEMENT_CATEGORY_NON_CONSENSUAL_BEHAVIOUR
RawAndExplodedColumn.STATEMENT_CATEGORY_PORNOGRAPHY_OR_SEXUALIZED_CONTENT
RawAndExplodedColumn.STATEMENT_CATEGORY_PROTECTION_OF_MINORS
RawAndExplodedColumn.STATEMENT_CATEGORY_RISK_FOR_PUBLIC_SECURITY
RawAndExplodedColumn.STATEMENT_CATEGORY_SCAMS_AND_FRAUD
RawAndExplodedColumn.STATEMENT_CATEGORY_SCOPE_OF_PLATFORM_SERVICE
RawAndExplodedColumn.STATEMENT_CATEGORY_SELF_HARM
RawAndExplodedColumn.STATEMENT_CATEGORY_UNSAFE_AND_ILLEGAL_PRODUCTS
RawAndExplodedColumn.STATEMENT_CATEGORY_VIOLENCE
RawAndExplodedColumn.account_type
RawAndExplodedColumn.application_date
RawAndExplodedColumn.automated_decision
RawAndExplodedColumn.automated_detection
RawAndExplodedColumn.category
RawAndExplodedColumn.category_addition
RawAndExplodedColumn.category_specification
RawAndExplodedColumn.category_specification_other
RawAndExplodedColumn.content_date
RawAndExplodedColumn.content_language
RawAndExplodedColumn.content_type
RawAndExplodedColumn.content_type_other
RawAndExplodedColumn.created_at
RawAndExplodedColumn.decision_account
RawAndExplodedColumn.decision_facts
RawAndExplodedColumn.decision_ground
RawAndExplodedColumn.decision_ground_reference_url
RawAndExplodedColumn.decision_monetary
RawAndExplodedColumn.decision_monetary_other
RawAndExplodedColumn.decision_provision
RawAndExplodedColumn.decision_visibility
RawAndExplodedColumn.decision_visibility_other
RawAndExplodedColumn.end_date_account_restriction
RawAndExplodedColumn.end_date_monetary_restriction
RawAndExplodedColumn.end_date_service_restriction
RawAndExplodedColumn.end_date_visibility_restriction
RawAndExplodedColumn.illegal_content_explanation
RawAndExplodedColumn.illegal_content_legal_ground
RawAndExplodedColumn.incompatible_content_explanation
RawAndExplodedColumn.incompatible_content_ground
RawAndExplodedColumn.incompatible_content_illegal
RawAndExplodedColumn.platform_name
RawAndExplodedColumn.platform_uid
RawAndExplodedColumn.source_identity
RawAndExplodedColumn.source_type
RawAndExplodedColumn.territorial_scope
RawAndExplodedColumn.uuid
RawAndExplodedColumns
SourceType
TDB_agg_data_folder_prefix
TDB_agg_data_format
TDB_agg_data_versions
TDB_chunkFormat
TDB_columnsFull
TDB_columnsFull.account_type
TDB_columnsFull.application_date
TDB_columnsFull.automated_decision
TDB_columnsFull.automated_detection
TDB_columnsFull.category
TDB_columnsFull.category_addition
TDB_columnsFull.category_specification
TDB_columnsFull.category_specification_other
TDB_columnsFull.content_date
TDB_columnsFull.content_language
TDB_columnsFull.content_type
TDB_columnsFull.content_type_other
TDB_columnsFull.created_at
TDB_columnsFull.decision_account
TDB_columnsFull.decision_facts
TDB_columnsFull.decision_ground
TDB_columnsFull.decision_ground_reference_url
TDB_columnsFull.decision_monetary
TDB_columnsFull.decision_monetary_other
TDB_columnsFull.decision_provision
TDB_columnsFull.decision_visibility
TDB_columnsFull.decision_visibility_other
TDB_columnsFull.end_date_account_restriction
TDB_columnsFull.end_date_monetary_restriction
TDB_columnsFull.end_date_service_restriction
TDB_columnsFull.end_date_visibility_restriction
TDB_columnsFull.illegal_content_explanation
TDB_columnsFull.illegal_content_legal_ground
TDB_columnsFull.incompatible_content_explanation
TDB_columnsFull.incompatible_content_ground
TDB_columnsFull.incompatible_content_illegal
TDB_columnsFull.platform_name
TDB_columnsFull.platform_uid
TDB_columnsFull.source_identity
TDB_columnsFull.source_type
TDB_columnsFull.territorial_scope
TDB_columnsFull.uuid
TDB_columnsLight
TDB_columnsLight.account_type
TDB_columnsLight.application_date
TDB_columnsLight.automated_decision
TDB_columnsLight.automated_detection
TDB_columnsLight.category
TDB_columnsLight.category_addition
TDB_columnsLight.category_specification
TDB_columnsLight.category_specification_other
TDB_columnsLight.content_date
TDB_columnsLight.content_language
TDB_columnsLight.content_type
TDB_columnsLight.content_type_other
TDB_columnsLight.created_at
TDB_columnsLight.decision_account
TDB_columnsLight.decision_ground
TDB_columnsLight.decision_ground_reference_url
TDB_columnsLight.decision_monetary
TDB_columnsLight.decision_monetary_other
TDB_columnsLight.decision_provision
TDB_columnsLight.decision_visibility
TDB_columnsLight.decision_visibility_other
TDB_columnsLight.end_date_account_restriction
TDB_columnsLight.end_date_monetary_restriction
TDB_columnsLight.end_date_service_restriction
TDB_columnsLight.end_date_visibility_restriction
TDB_columnsLight.illegal_content_legal_ground
TDB_columnsLight.incompatible_content_ground
TDB_columnsLight.incompatible_content_illegal
TDB_columnsLight.platform_name
TDB_columnsLight.platform_uid
TDB_columnsLight.source_identity
TDB_columnsLight.source_type
TDB_columnsLight.uuid
TDB_dailyDumpsVersion
TDB_datetimeColumns
TDB_datetimeColumns.application_date
TDB_datetimeColumns.content_date
TDB_datetimeColumns.created_at
TDB_datetimeColumns.end_date_account_restriction
TDB_datetimeColumns.end_date_monetary_restriction
TDB_datetimeColumns.end_date_service_restriction
TDB_datetimeColumns.end_date_visibility_restriction
TDB_freetextColumns
TDB_freetextColumns.category_specification_other
TDB_freetextColumns.content_type_other
TDB_freetextColumns.decision_facts
TDB_freetextColumns.decision_monetary_other
TDB_freetextColumns.decision_visibility_other
TDB_freetextColumns.illegal_content_explanation
TDB_freetextColumns.incompatible_content_explanation
TerritorialScope
TerritorialScope.AT
TerritorialScope.BE
TerritorialScope.BG
TerritorialScope.CY
TerritorialScope.CZ
TerritorialScope.DE
TerritorialScope.DK
TerritorialScope.EE
TerritorialScope.EEA
TerritorialScope.EEA_no_IS
TerritorialScope.ES
TerritorialScope.EU
TerritorialScope.FI
TerritorialScope.FR
TerritorialScope.GR
TerritorialScope.HR
TerritorialScope.HU
TerritorialScope.IE
TerritorialScope.IS
TerritorialScope.IT
TerritorialScope.LI
TerritorialScope.LT
TerritorialScope.LU
TerritorialScope.LV
TerritorialScope.MT
TerritorialScope.NL
TerritorialScope.NO
TerritorialScope.PL
TerritorialScope.PT
TerritorialScope.RO
TerritorialScope.SE
TerritorialScope.SI
TerritorialScope.SK
UseColumns
all_columns
all_columns_light
columns_common_prefixes
columns_to_explode
datetime_columns
datetime_format
datetime_format_strftime
territorial_scopes
- dsa_tdb.utils module
Module contents
The dsa_tdb module documentation.
The dsa_tdb module provides a set of tools to interact with the DSA Transparency Database (TDB) data. It provides a set of classes and functions to fetch, extract, transform, filter and load data from the TDB.
It intrnally uses pyspark to handle the data at scale even on regular computers and can be easily introduced in pipelines using pandas or other data manipulation libraries.
- class dsa_tdb.TDB_DataFrame(spark: SparkSession | None = None)
Bases:
object
The base class for the TDB DataFrame object. This class is used to load the TDB data into a DataFrame and perform operations on it. The operations are to filter and aggregate the data, exporting it to other formats.
The class is initialized with a SparkSession object and provides a set of methods to load data from the TDB.
The inherent DataFrame object is a Spark DataFrame and can be used as such, accessible through the df attribute.
- aggregate_SoRs(columns_to_group: List[RawAndExplodedColumn] | None = None, horizontally_explode_columns: bool | None = None, delete_original_columns: bool | None = None, normalize_platform_name: bool | None = None, platforms_to_exclude: List[str] | None = None, platforms_to_include: List[str] | None = None, created_at_dt_floor: str | None = None, config_file: str | None = None, **kwargs)
Aggregates the SoRs from the dataframe. The configuration can be passed either using the provided and additional keyword arguments or by providing a configuration file in config_file. Note that if both are provided, the keyword arguments will take precedence.
- Parameters:
columns_to_group (Union[List[T.RawAndExplodedColumn],None], optional) – The columns to group the data by, by default None will use all except uuid and platform_uid.
horizontally_explode_columns (bool, optional) – Whether to horizontally explode the columns with nested structures, by default True.
delete_original_columns (bool, optional) – Whether to delete the original columns after horizontally exploding them, by default False.
normalize_platform_name (bool, optional) – Whether to normalize the platform name to lowercase, by default False.
platforms_to_exclude (Union[List[str],None], optional) – The platforms to exclude from the data, by default None.
platforms_to_include (Union[List[str],None], optional) – The platforms to include in the data, by default None.
created_at_dt_floor (Union[str,None], optional) – The floor to round the created_at datetime to, by default None.
config_file (Union[str,None], optional) – The path to a configuration file, by default None.
**kwargs (dict) – The aggregation arguments. these are all the remaining entries of
dsa_tdb.types.AggregationConfig
that are not directly exposed in the function arguments.
- property columns
- filter_SoRs(columns_to_import: List[TDB_columnsFull] | None = None, horizontally_explode_columns: bool = True, delete_original_columns: bool = False, normalize_platform_name: bool = False, platforms_to_exclude: List[str] | None = None, platforms_to_include: List[str] | None = None, created_at_dt_floor: str | None = None, config_file: str | None = None, **kwargs)
Filters the SoRs from the dataframe. The configuration can be passed either using the provided and additional keyword arguments or by providing a configuration file in config_file. Note that if both are provided, the keyword arguments will take precedence.
- Parameters:
columns_to_import (Union[List[T.TDB_columnsFull],None], optional) – The columns to import from the dataframe, by default None.
horizontally_explode_columns (bool, optional) – Whether to horizontally explode the columns with nested structures, by default True.
delete_original_columns (bool, optional) – Whether to delete the original columns after horizontally exploding them, by default False.
normalize_platform_name (bool, optional) – Whether to normalize the platform name to lowercase, by default False.
platforms_to_exclude (Union[List[str],None], optional) – The platforms to exclude from the data, by default None.
platforms_to_include (Union[List[str],None], optional) – The platforms to include in the data, by default None.
created_at_dt_floor (Union[str,None], optional) – The floor to round the created_at datetime to, by default None.
config_file (Union[str,None], optional) – The path to a configuration file, by default None.
**kwargs (dict) – The filter arguments. these are all the remaining entries of
dsa_tdb.types.FilteringConfig
that are not directly exposed in the function arguments.
- head(n: int = 1)
- loadData(root_folder: str, platform: str, version: TDB_dailyDumpsVersion = TDB_dailyDumpsVersion.full, platforms_to_exclude: List[str] | None = None, start_date: str | None = None, end_date: str | None = None, columns_to_import: List[str] | None = None, explode_columns: bool = False, delete_original: bool = True, fillna_str: str | None = None, fillna_bool: bool | None = False, input_format: TDB_chunkFormat = TDB_chunkFormat.parquet, content_date_range: List[str] | List[datetime] | None = None, decision_date_range: List[str] | List[datetime] | None = None, created_at_date_range: List[str] | List[datetime] | None = None, override_chunked_subfolder: str = 'daily_dumps_chunked', compute_restriction_duration: bool = False, normalize_platform_name: bool = True, normalize_content_type_other: bool = False)
Load data from the TDB into a Spark DataFrame.
This method loads the data from the TDB daily dumps into a Spark DataFrame. The data is loaded from the files in the specified root folder, for the specified platform and version. The data is filtered and transformed according to the specified options.
- Parameters:
root_folder (str) – The root folder where the daily dumps for each platform and version are stored.
platform (str) – The platform to load the data from.
version (T.TDB_dailyDumpsVersion, optional) – The version of the daily dumps to load, by default
dsa_tdb.types.TDB_dailyDumpsVersion.full
.platforms_to_exclude (Union[List[str],None], optional) – A list of platforms to exclude from the data, by default None.
start_date (Union[str,None], optional) – The start date to load the data from, by default None.
end_date (Union[str,None], optional) – The end date to load the data to, by default None.
columns_to_import (Union[List[str],None], optional) – The list of columns to import from the daily dumps, by default None.
explode_columns (bool, optional) – Whether to horizontally explode the columns with nested structures, by default False.
delete_original (bool, optional) – Whether to delete the original files after horizontally exploding them, by default True.
fillna_str (Union[str,None], optional) – The value to fill the missing string values with, by default None.
fillna_bool (Union[bool,None], optional) – The value to fill the missing boolean values with, by default False.
input_format (T.TDB_chunkFormat, optional) – The format of the daily dump files, by default T.TDB_chunkFormat.parquet.
content_date_range (Union[List[str],List[datetime],None], optional) – The date range to filter the content_date column, by default None.
decision_date_range (Union[List[str],List[datetime],None], optional) – The date range to filter the decision_date column, by default None.
created_at_date_range (Union[List[str],List[datetime],None], optional) – The date range to filter the created_at column, by default None.
override_chunked_subfolder (str, optional) – The subfolder where the chunked files are stored, by default
dsa_tdb.types.CHUNKED_FILES_SUBFOLDER_NAME
. Do not change this unless you know what you are doing.compute_restriction_duration (bool, optional) – Whether to compute the restriction duration from the restriction_start and restriction_end columns, by default False.
normalize_platform_name (bool, optional) – Whether to normalize the platform name to lowercase, by default True.
normalize_content_type_other (bool, optional) – Whether to normalize the content_type_other column to lowercase, by default False.
- Raises:
ValueError – If the content_date_range, decision_date_range or created_at_date_range have more than two elements.
- loadParquet(path: str | List[str])
Load a parquet file into the DataFrame.
This method loads a parquet file into the DataFrame.
- Parameters:
path (str, List[str]) – The path to the parquet file. Can also be a pattern to load multiple files or a list of paths.
- sample(n: int = 1)
- schema()
- show(n: int = 20)
- text_filter(column: str, expr: str)
Filter the DataFrame using a text expression.
- Parameters:
column (str) – The column to filter on.
expr (str) – The expression to filter with. Can also be a regular expression.
- toPandas() DataFrame