near_recommender.src.features package#

Submodules#

near_recommender.src.features.preprocessors module#

near_recommender.src.features.preprocessors.normalize_corpus(docs, remove_links=None)#
near_recommender.src.features.preprocessors.normalize_document(doc, remove_links=None)#

near_recommender.src.features.related_profile_tags module#

near_recommender.src.features.related_profile_tags.find_similar_users(profiles, col_agg_tags, user, top_k)#

Given a DataFrame of user profiles, a column name for aggregated tags, an index for the target user, and a number k, this function returns a dictionary with a list of k user profiles similar to the target user, along with their similarity scores.

Parameters:
  • profiles (DataFrame) – a DataFrame of user profiles.

  • col_agg_tags (str) – a string representing the column name for the aggregated tags.

  • user (str) – a str representing the target user.

  • top_k (int) – an integer representing the number of similar users to return.

Returns:

List[Dict[str, any]], a list of k user profiles similar to the target user, along with their similarity scores.

Return type:

List[Dict[str, any]]

Each user profile is represented as a dictionary with two keys: “score” and “similar_profile”. The value associated with the “score” key is a float representing the cosine similarity score between the target user and the similar user. :raises: ValueError: If the value of idx is invalid or if no similar users are found for the given input.

near_recommender.src.features.top_sentences module#

near_recommender.src.features.top_sentences.return_similar_sentences(query, model_embedder, corpus_embeddings, top_k, df, sentences)#

Returns the top k most similar sentences in the corpus to the given query, along with their similarity scores, associated DataFrame row, and username.

Parameters:
  • query (str) – str, the query to compare with the corpus sentences.

  • model_embedder (SentenceTransformer) – SentenceTransformer, a sentence embedding model to encode the query and corpus sentences.

  • corpus_embeddings (Tensor) – torch.Tensor, a list of lists of sentence embeddings for the corpus sentences.

  • top_k (int) – int, the number of top similar sentences to return.

  • df (DataFrame) – pd.DataFrame, a pandas DataFrame containing the corresponding row for each sentence in the corpus.

  • sentences (List[str]) – List[str], a list of the original sentences in the corpus.

Returns:

List[Tuple[str, float, str, pd.Series]], a list of tuples containing the top similar sentences, their

Return type:

List[Tuple[str, float, str, Series]]

similarity scores, the associated username and the corresponding DataFrame row.

near_recommender.src.features.utils module#

near_recommender.src.features.utils.filter_last_post(signer_id)#
near_recommender.src.features.utils.get_index_from_signer_id(signer_id, dataset)#
near_recommender.src.features.utils.load_pretrained_model(base_filename)#
Parameters:

base_filename (str) –

near_recommender.src.features.utils.run_update_model(model, corpus)#
Parameters:
  • model (SentenceTransformer) –

  • corpus (List) –

Return type:

SentenceTransformer

near_recommender.src.features.utils.save_pretrained_model(base_filename, cxt)#
Parameters:
  • base_filename (str) –

  • cxt (str) –

Return type:

None

Module contents#