Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give posts (any type) higher priority in link search results #63683

Open
richtabor opened this issue Jul 17, 2024 · 12 comments · May be fixed by #68439
Open

Give posts (any type) higher priority in link search results #63683

richtabor opened this issue Jul 17, 2024 · 12 comments · May be fixed by #68439
Labels
[Feature] Link Editing Link components (LinkControl, URLInput) and integrations (RichText link formatting) [Status] In Progress Tracking issues with work in progress [Type] Regression Related to a regression in the latest release

Comments

@richtabor
Copy link
Member

In LinkControl you can search for content existing on your site. This is great, but I did find that attachments were surfaced higher in search results than posts matching the search requirements.

I propose that pages and posts of all type are prioritized in the search results, above all others. It's much more likely to link to pages and posts, than to attachments.

Visual

Maintain a consistent layout and style across patterns while

@richtabor richtabor added [Type] Enhancement A suggestion for improvement. [Feature] Link Editing Link components (LinkControl, URLInput) and integrations (RichText link formatting) labels Jul 17, 2024
@richtabor richtabor added this to Polish Jul 17, 2024
@richtabor richtabor moved this to Needs development in Polish Jul 17, 2024
@noisysocks
Copy link
Member

noisysocks commented Jul 24, 2024

I propose that pages and posts of all type are prioritized in the search results, above all others.

Take this too literally and we'll cause a regression of #56478 😀 I think we want to prioritise posts and pages but not always place them above every other result. It's important that users can easily link to tags and categories especially from the Navigation block.

@richtabor
Copy link
Member Author

I'd say you are likely to search for and link a page many times more likely than a tag or category archive.

I would say given a page, tag, and attachment matched with patterns, the page should be first in that list.

Perhaps deprioritizing attachments would meet the expectations better?

@noisysocks
Copy link
Member

Yeah in your example screenshot I'd expect Composing with patterns to be first but if you searched for "patterns-1" I'd expect patterns-1 to appear first.

The key thing to bear in mind is that we don't want to regress #56478 as that bug made creating some types of navigation basically impossible.

I think giving posts a slight (25%? need to play with the exact number) boost and attachments a slight penalty (25%?) should work.

@richtabor
Copy link
Member Author

I'm down for trying that.

@fabiankaegy
Copy link
Member

After updating to WordPress 6.7 this is something we've heard several client teams complain about. They are having a much harder time finding the actual relevant content than they did before the update which is impacting their workflows.

In all honesty I would love an option to remove attachments from the results altogether. On most sites that simply isn't something that editors need to do. And if so they can add the link manually 🤔

The exploration of giving less prominence to attachments sounds like a good first step though 👍

I'm also going to add the [Type] Regression label to this since the search here used to be much better than it is now in 6.7

@fabiankaegy fabiankaegy added [Type] Regression Related to a regression in the latest release and removed [Type] Enhancement A suggestion for improvement. labels Dec 3, 2024
@SainathPoojary
Copy link
Contributor

Hey @noisysocks,

I explored implementing the 25% boost for post types and the 25% penalty for attachments. Here are my findings:

Initially, I directly applied the adjustments like this:

// Boost for post types, penalty for attachments
if (result.kind === 'post-type') {
    relevanceScore *= 1.25;
} else if (result.kind === 'media') {
    relevanceScore *= 0.75;
}

However, I noticed that the recently added sorting logic was penalizing results with longer titles. This was due to the score calculation formula:

(exactMatchingTokens.length / titleTokens.length) * 10;
const subMatchScore = subMatchingTokens.length / titleTokens.length;

To address this, I modified the logic to depend on the length of the search query instead of the title:

const exactMatchScore = 
    (exactMatchingTokens.length / searchTokens.length) * 10;

const subMatchScore = 
    subMatchingTokens.length / searchTokens.length;

This worked better, but exact string matches were still being ranked lower than post types. To resolve this, I added a significant boost for exact title matches to ensure they appear at the top:

// Significant boost for exact title matches
if (result.title.toLowerCase() === search.toLowerCase()) {
    relevanceScore *= 100;
}

Currently, the ranking logic is functioning as follows (I’ll share a video to demonstrate this). I’ve also retested the previously implemented fixes to ensure they aren’t breaking anything, and everything appears to be working as expected.

Do you think this approach is good to proceed with, or would you suggest any additional changes? If everything looks good, I’ll raise a PR with these updates.

Thanks!

Complete code:

export function sortResults( results: SearchResult[], search: string ) {
	const searchTokens = tokenize( search );

	const scores = {};
	for ( const result of results ) {
		if ( result.title ) {
			const titleTokens = tokenize( result.title );
			const exactMatchingTokens = titleTokens.filter( ( titleToken ) =>
				searchTokens.some(
					( searchToken ) => titleToken === searchToken
				)
			);
			const subMatchingTokens = titleTokens.filter( ( titleToken ) =>
				searchTokens.some(
					( searchToken ) =>
						titleToken !== searchToken &&
						titleToken.includes( searchToken )
				)
			);

			// The score is a combination of exact matches and sub-matches.
			// More weight is given to exact matches, as they are more relevant (e.g. "cat" vs "caterpillar").
			// Diving by the total number of tokens in the title normalizes the score and skews
			// the results towards shorter titles.
			const exactMatchScore =
				( exactMatchingTokens.length / searchTokens.length ) * 10;

			const subMatchScore =
				subMatchingTokens.length / searchTokens.length;

			scores[ result.id ] = exactMatchScore + subMatchScore;

			let relevanceScore = exactMatchScore + subMatchScore;

			// Boost for post types, penalty for attachments
			if ( result.kind === 'post-type' ) {
				relevanceScore *= 1.25;
			} else if ( result.kind === 'media' ) {
				relevanceScore *= 0.75;
			}

			// Significant boost for exact title matches
			if ( result.title.toLowerCase() === search.toLowerCase() ) {
				relevanceScore *= 100;
			}

			scores[ result.id ] = relevanceScore;
		} else {
			scores[ result.id ] = 0;
		}
	}

	return results.sort( ( a, b ) => scores[ b.id ] - scores[ a.id ] );
}

Preview

Current Implementation:

Screen.Recording.2024-12-09.at.7.50.25.PM.mov

Tested whether the current changes break the previously added fix in #67367

Screen.Recording.2024-12-09.at.7.51.50.PM.mov

@getdave
Copy link
Contributor

getdave commented Dec 10, 2024

There's #67563 which has a working prioritisation of Posts. I would love for some reviews on that and/or code contributions to tweak this towards what we need.

@SainathPoojary
Copy link
Contributor

Hi @getdave,

I tested the solution in #67563, and it seems to work well for me overall. Initially, I thought it might cause the regression mentioned in #56478, but after further testing, I was unable to reproduce the issue.

Specifically, I created multiple posts using the following commands:

wp post create --post_title="The Ultimate Adventure Awaits" --post_status=publish --post_type=post
wp post create --post_title="An Unexpected Adventure Begins" --post_status=publish --post_type=post
wp post create --post_title="Adventure Through the Mountains" --post_status=publish --post_type=post
wp post create --post_title="A Day Full of Adventure and Fun" --post_status=publish --post_type=post
wp post create --post_title="Lost in an Adventure Wonderland" --post_status=publish --post_type=post
wp post create --post_title="The Adventure of a Lifetime" --post_status=publish --post_type=post
wp post create --post_title="Embarking on a New Adventure" --post_status=publish --post_type=post
wp post create --post_title="Adventure Calls from Afar" --post_status=publish --post_type=post
wp post create --post_title="Exploring the Jungle of Adventure" --post_status=publish --post_type=post
wp post create --post_title="The Secret Adventure Revealed" --post_status=publish --post_type=post
wp post create --post_title="Adventure by the Ocean Waves" --post_status=publish --post_type=post
wp post create --post_title="A Thrilling Adventure Awaits You" --post_status=publish --post_type=post
wp post create --post_title="The Magical World of Adventure" --post_status=publish --post_type=post
wp post create --post_title="Adventure in the Unknown Realms" --post_status=publish --post_type=post
wp post create --post_title="An Epic Adventure Through Time" --post_status=publish --post_type=post
wp post create --post_title="Adventure in the Heart of the Forest" --post_status=publish --post_type=post
wp post create --post_title="The Great Adventure Journey" --post_status=publish --post_type=post
wp post create --post_title="Adventure with Friends and Family" --post_status=publish --post_type=post
wp post create --post_title="Chasing the Spirit of Adventure" --post_status=publish --post_type=post
wp post create --post_title="An Unforgettable Adventure Experience" --post_status=publish --post_type=post

Additionally created 1 category and 1 attachment with name "Adventure"

In my tests, I believe the sorting behavior appears to work correctly.

2024-12-10.21-15-04.mp4

@noisysocks, could you confirm if the test cases for this scenario accurately validate the issue described in #56478?

I’d appreciate your thoughts on this, @getdave.

@t-hamano
Copy link
Contributor

I'm beginning to think that an approach that relies solely on weighting may not be able to fully solve this problem. Search results are always limited to a maximum of 20 results, but what users want to prioritize can vary infinitely depending on user preferences and site content.

Maybe the UI itself needs some improvements, like the following: Please excuse the clumsy design 😅

Add a button to load more search results:

Image

Allow search results to be filtered by type:

Image

@WordPress/gutenberg-design Any ideas?

@jasmussen
Copy link
Contributor

A dropdown to let you filter by type could be useful (I could see a filter dropdown live inside the input). The only hesitancy there is that this doesn't solve the main issue at hand, which is that the default search should either emphasize things that are not attachments, or de-emphasize attachments. We might even omit attachments as suggestions entirely, IMO the main flow for linking such is to use the media library.

@getdave
Copy link
Contributor

getdave commented Dec 16, 2024

To provide everyone with context, attachments were added because there are users who want to link to documents (e.g. PDFs). It's quite common.

I would support @t-hamano's proposal in conjunction with improving the weighting.

What I would say in terms of design is that I remember this being explored previously and we quickly realised we'd need additional tabs other than All, Posts, Terms and Media. So whatever design is proposed it needs to be able to scale.

Thanks for the dialogue here. Great to see 👍

@jameskoster
Copy link
Contributor

I agree that filtering would be a useful enhancement, and can theoretically be used to solve this issue. A simple way to start might be add two tabs;

  • "Content" – all post types and taxonomies (selected by default)
  • "Media"

@github-actions github-actions bot added the [Status] In Progress Tracking issues with work in progress label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Link Editing Link components (LinkControl, URLInput) and integrations (RichText link formatting) [Status] In Progress Tracking issues with work in progress [Type] Regression Related to a regression in the latest release
Projects
Status: Needs development
Development

Successfully merging a pull request may close this issue.

8 participants