📄Post Sorting

This page describes how posts are sorted on Matar.


Sort Parent Posts

Post sorting in Matar utilizes Flask-SQLAlchemy which provides various attributes on our models to get back a new query object over records. Here is how a post-sorting function looks like:

def tmp_sort_posts():
    posts_sorted = 0
    for post in Post.query.filter(
        Post.parent_post_id == None,
        Post.sorted_at < cur_ms() - 30 * 60 * 1000
    ).limit(50):
        post.sort_child_posts()
        posts_sorted += 1

    return {
        "message": "posts sorted",
        "posts_sorted": posts_sorted
    }

Describing Code:

  • A posts_sorted counter has been initalized to 0 for keeping a check on how many posts are sorted.

  • Inside the for loop, with a **query SQLAlchemy **attribute select records where following conditions are met: ** - Post.parent_post_id **is **None **(select posts that do not have a parent post)(posts with questions) - **Post.sorted_at **is less than the current timestamp minus 30 minutes **cur_ms() **function is used to get the current timestamp in milliseconds .More explanation on the **cur_ms() **function here.

  • Limits result to the first 50 records where the above conditions are met.

  • Also inside the for loop, **post.sort_child_posts() **method is called on each post which sorts child posts related to the current post.

  • Increment the posts_sorted variable by 1 for each post processed in the loop.

  • Finally returns a dictionary containing a message and key value pair that the posts were sorted and the number of posts that were sorted, stored in posts_sorted variable.


Sort Child(Answer) Posts

  • Here is sort_child_posts definition which has the following functionality:

  def sort_child_posts(self):

        child_posts_map = {}

        for child_post in Post.query.filter(
            Post.parent_post_id == self._id
        ):
            post_id = child_post._id
            if(not child_posts_map.get(post_id)):
                child_posts_map[post_id] = {
                    "block": 0,
                    "upvote": 0,
                    "downvote": 0,
                    "like": 0
                }

        # get Post Activity type  for each child post

        post_activities = db.session.query(PostActivity).join(
            Post,
            Post._id == PostActivity.post_id
        ).filter(
            PostActivity.post_activity_type.in_(["upvote", "downvote", "block", "like"])
        ).filter(
            Post.parent_post_id == self._id
        )

        # compute child post activities
        for post_activity in post_activities:
            post_id = post_activity.post_id
            post_activity_type = post_activity.post_activity_type
            # if new activity in between
            if(not child_posts_map.get(post_id)):
                child_posts_map[post_id] = {
                    "block": 0,
                    "upvote": 0,
                    "downvote": 0,
                    "like": 0
                }
            child_posts_map[post_id][post_activity_type] += 1

        # update child posts sort order and data
        child_posts_count = 0
        for child_post in Post.query.filter(
            Post.parent_post_id == self._id
        ):
            if(not child_posts_map.get(child_post._id)):
                continue
            # compute sort order
            child_post.sort_order = child_post.compute_sort_order(
                post_activity_map=child_posts_map.get(child_post._id)
            )
            _child_post_data = child_post.data
            _child_post_data.update(child_posts_map.get(child_post._id))
            child_post.update_json(Post.data, _child_post_data)

            child_posts_count += 1

        self.sorted_at = cur_ms()
        normalized_timestamp = normalize_value(int(self.created_at.timestamp() * 1000), MIN_TIMESTAMP, MAX_TIMESTAMP)
        normalized_subposts = normalize_value(child_posts_count, 0, 1000)

        self.sort_order = int((70 * normalized_timestamp + 30 * normalized_subposts) * 1000)
        self.commit()
                

Describing sort_child_posts Code:

  • An empty map child_posts_map is defined to store information about child posts.

  • Query the Posts table and perform the following actions:

  • Query the Posts table and perform the following actions:

    1. For each child post, extracts its _id and adds an entry to child_posts_map with initial values for "block," "upvote," "downvote," and "like" set to 0.

    2. If the post already exists in child_posts_map donot create a new entry but query the database to fetch post activities related to child posts. such as "upvote," "downvote," "block," or "like."

    3. For each post activity, update the entry in child_posts_map based on the activity type and post ID.

    4. Also iterate through the child posts again and update their sort order and data based on the computed values in child_posts_map

    5. Update the sorted_at attribute of the parent post if child posts are sorted.

  • Compute a sort_order value for the parent post based on a combination of the timestamp of its creation and the number of child posts. The specific calculation appears to be a weighted sum of the normalized timestamp and normalized subpost count.

  • Finally commit the changes to the parent post and save them to db


Sort Post Order

def compute_sort_order(self, post_activity_map=None):
        if(not post_activity_map or not isinstance(post_activity_map, dict)):
            return
        W_LIKEABILITY = 0.5
        W_BLOCK = -0.55
        W_UPVOTE = 0.7
        W_DOWNVOTE = 0.7
        W_AGE = -0.01

        age_days_count = (datetime.utcnow() - self.created_at).days
        block_count = post_activity_map.get("block") or 0
        upvote_count = post_activity_map.get("like") or 0
        downvote_count = post_activity_map.get("downvote") or 0
        likeability_count = upvote_count - downvote_count # shantanus param

        sort_order = block_count * W_BLOCK + upvote_count * W_UPVOTE + likeability_count * W_LIKEABILITY + age_days_count * W_AGE

        return sort_order

Describing compute_sort_order code:

  1. Input Parameters: The method takes an optional argument post_activity_map, which is expected to be a dictionary. This dictionary is assumed to contain information about the post's activity, such as the number of likes, dislikes, and blocks. If post_activity_map is not provided or is not a dictionary, the method returns None.

  2. Weight Constants: Several weight constants (W_LIKEABILITY, W_BLOCK, W_UPVOTE, W_DOWNVOTE, and W_AGE) are defined at the beginning of the method. For example, W_LIKEABILITY is the weight assigned to the likeability factor.

  3. Age Calculation: The code calculates the age of the post in days by subtracting the post's creation time (self.created_at) from the current UTC time (datetime.utcnow()). This age in days is stored in the age_days_count variable.

  4. Activity Counts: The code then retrieves the counts of various activities (blocks, upvotes, downvotes, and likes) from the post_activity_map dictionary. If any of these counts are missing in the dictionary, it defaults to 0.

  5. Likeability Score: The likeability_count is calculated as the difference between the number of upvotes and downvotes. This can be considered a measure of the post's overall likeability.

  6. Sort Order Calculation: The sort_order is calculated by multiplying each activity count by its respective weight constant and summing them up. Additionally, the age of the post is factored in by subtracting its age in days multiplied by W_AGE.

  7. Return Value: The computed sort_order value is returned as the result of the method.

Overall, this method computes a single numeric value (sort_order) that represents the "sorting order" of a post based on a combination of factors such as the post's age, likeability, blocks, upvotes, and downvotes.


Convert Current Time to Milliseconds

def cur_ms():
    epoch = datetime.utcfromtimestamp(0)
    current_timestamp = datetime.utcnow()
    return int((current_timestamp - epoch).total_seconds() * 1000)

Code Summary :

In summary, when you call the cur_ms function, it returns the current timestamp in milliseconds since the Unix epoch. This can be useful for various time-related operations and for recording the time when an event occurs.

Last updated