near_recommender.src.features package#
Submodules#
near_recommender.src.features.preprocessors module#
- near_recommender.src.features.preprocessors.normalize_corpus(docs, remove_links=None)#
- near_recommender.src.features.preprocessors.normalize_document(doc, remove_links=None)#
near_recommender.src.features.top_sentences module#
- near_recommender.src.features.top_sentences.return_similar_sentences(query, model_embedder, corpus_embeddings, top_k, df, sentences)#
Returns the top k most similar sentences in the corpus to the given query, along with their similarity scores, associated DataFrame row, and username.
- Parameters:
query (str) – str, the query to compare with the corpus sentences.
model_embedder (SentenceTransformer) – SentenceTransformer, a sentence embedding model to encode the query and corpus sentences.
corpus_embeddings (Tensor) – torch.Tensor, a list of lists of sentence embeddings for the corpus sentences.
top_k (int) – int, the number of top similar sentences to return.
df (DataFrame) – pd.DataFrame, a pandas DataFrame containing the corresponding row for each sentence in the corpus.
sentences (List[str]) – List[str], a list of the original sentences in the corpus.
- Returns:
List[Tuple[str, float, str, pd.Series]], a list of tuples containing the top similar sentences, their
- Return type:
List[Tuple[str, float, str, Series]]
similarity scores, the associated username and the corresponding DataFrame row.
near_recommender.src.features.utils module#
- near_recommender.src.features.utils.filter_last_post(signer_id)#
- near_recommender.src.features.utils.get_index_from_signer_id(signer_id, dataset)#
- near_recommender.src.features.utils.load_pretrained_model(base_filename)#
- Parameters:
base_filename (str) –
- near_recommender.src.features.utils.run_update_model(model, corpus)#
- Parameters:
model (SentenceTransformer) –
corpus (List) –
- Return type:
SentenceTransformer
- near_recommender.src.features.utils.save_pretrained_model(base_filename, cxt)#
- Parameters:
base_filename (str) –
cxt (str) –
- Return type:
None