Difficulty Finding SuttaCentral in Google Searches

I just looked at the page using the WAVE accessibility plugin an the only structure it was able to find on the page was from the “Oh No!” text. As well, it found no page regions or ARIA landmarks.

Maybe you have already done an accessibility audit, but if not, that could be a place to start. Making sure that the page was meeting those standards should be a baseline, and perhaps that would help with other things.

I think that analysis tool may not mean what you think it means.

Have you added the site to google search console? @phineas-pta mentioned it. I have it set up for a few of my sites and it sends alerts when there are problems on a page.

It does seem that there is technology used on suttacentral that works well for sighted humans, but not so well for code.

1 Like

Sorry for the confusion. For the personal injury lawyer website I maintain and consistently rank well for, I use wordpress and the Yoast SEO plugin. It has a section called something like SEO Title, and I write my title tag there focusing on primary and secondary relevant keywords I want to rank for. That title always shows up in google searches. The H1 tag is the title of my post or page. I use keywords in it, too.

It seems like I’m not familiar enough with the advanced technology SC uses to be able to offer much help. Wordpress and Yoast takes care of the more technical side of things for me.

I do find it surprising though that Bing doesn’t display the title tag for SC’s MN 41 in its search results. I also find it surprising that I can’t find SC’s MN 41 (besides the legacy page) after looking at many pages on google. Sorry I wasn’t more help.

Just a wild guess, but does Google know that Sāleyyakasutta also means Sāleyyaka sutta, or, even if it does, does the lack of “sutta” on it’s own push it down the list?

Furthermore, I don’t get a Sutta Central hit for Sāleyyakasutta in the first page or so:
https://www.google.com/search?q=Sāleyyakasutta&oq=Sāleyyakasutta&aqs=chrome..69i57j69i61&sourceid=chrome&ie=UTF-8

I think it means exactly what I think it means. :wink:

1 Like

Good point. I brought this up a couple of years ago @ Searching SC: Limited Hits for Anapana - #21 by sujato

  1. In The Mindfulness of Breathing Sutta, and perhaps other suttas, anapanasati is spelled following an approach that seems to not be commonly used or followed. Namely: Ānāpānassatisutta. For example, google doesn’t recognize that spelling, whereas it clearly recognizes anapanasati. So, although the content in MN 118 is fantastic, it follows a spelling that google doesn’t seem to recognize. Thus, SC will never likely rank for anapanasati, and may be hindered by not having the common spelling as content on its page.

The fact that “sutta” is not a commonly searched term anyway makes it probably even less likely that Google would understand the title Sāleyyakasutta means Sāleyyaka sutta.

Interestingly, SuttaCentral still does not show up in a search for “anapanasati sutta”. On the other hand, a SC Discuss & Discover post does show up after about three pages. According to Google’s keyword planner, anapanasati sutta is one of the most commonly searched terms related to suttas with about 1,000 to 10,000 monthly hits worldwide–no other EBT is searched more.

Screenshot 2021-05-18 6.37.41 PM

Hope this helps.

1 Like

Great! Then you know that it doesn’t mean very much. :slight_smile: From the research I have done, it seems that the things that contribute to the Lighthouse SEO score are extremely limited.

When I run the Lighthouse test in Chrome on https://suttacentral.net/mn118/en/sujato (Lighthouse has to be run on individual pages) it does get 100% SEO score. But it says to run the Structured Data Tool. When I run that, it says there are zero warning or errors. But it also says there are zero items. So it’s not finding any content? It also has a general message saying

So I run the rich results test .

Then I get this message

When I click on the View Rendered HTML, it just shows what we can see when we do a View Page Source in our browser.

So this leads me to believe there is a chance that Google is not seeing the content of the page. I’m not an expert, or probably even qualified at all to make this assertion. But hey, this is the internet!

I did try and do a site specific Google search for a term that appears quite a bit, but not too often… stilt longhouse. (stilt longhouse site:suttacentral.net) I get 18 results

9 are to the discussion forum
3 were for suttas that included the words “stilt longhouse”(1, 2, 3)
6 were just suttas part of the “2. Shaking the Stilt Longhouse” division of SN 51

When I do a search on SuttaCentral.net for “stilt longhouse” I get 57 results.

Another test that Google recommends is the Structured Data Linter. So I run that test on mn118/en/sujato and get this.

So it looks like none of the structured data applies to this specific page. (could this be why link cards in D&D don’t show the correct info?)

Winding my way around recommended documentation, I come to this interesting page.

I run the recomended Mobility Friendly Test. on mn118/en/sujato

Good news!
image

But… Oh…

Yeah. So the Rendered Page? It’s blank.

More ideas?

3 Likes

I’m guessing this is a result of the difference between “and” vs “or” in keywords… google normally uses the “and” function, so both the words need to be in the article, and only shows 18 results. SC search uses the “or” function, so if either of the keywords is in the article, it shows in the 57 results.

I could be wrong, but it sounds like there are 18 articles which use the words “stilt” and “longhouse”, and 57 articles which use the words, “stilt” or “longhouse”.

Edit: I did several google searches to show this point:
“stilt OR longhouse site:suttacentral.net” says, “About 62 results”
“stilt AND longhouse site:suttacentral.net” says, “About 22 results”
“stilt longhouse site:suttacentral.net” says, “About 25 results”
“longhouse site:suttacentral.net” says, “About 35 results”
“stilt site:suttacentral.net” says, “About 54 results”

And oddly enough, If I google ‘“stilt longhouse” site:sittacentral.net’ (with “stilt longhouse” in quotes, specifically asking for google to use the AND function, additionally the words must appear next to each other in this specific order), I get “About 40 results” (more results than without them in quotes or using the AND function)

But how many results did you actually get? When I did my search, it say “About 80” but then when you actually count them, there were only 18. The “About” number is really meaningless.

And if you do my exact search on SuttaCentral,
https://suttacentral.net/search?query=%22stilt%20longhouse%22

You will see that I put quotation marks around it and the 57 is only results with the exact phrase. And it’s not individual occurrences of the phrase, but pages with the phrase. Take a look at the search for yourself.

So in my reconing, if we disregard the suttas that only have that phrase in the chapter heading, google has 3 out of 51 pages with the phrase. 6%.

What’s curious is how it got those three. The mobility test turns up blank pages for them too.

1 Like

When I look at the mobility test, I can see some of the website is loading. If I open the Inspector and search for the word Buddha, it appears 11 times in the html.

The title and main text of the sutta are not loading. “Breathing” returns 0 matches.

If I go to https://suttacentral.net/mn118/en/sujato, and use the firefox built-in Mobility tester (ctrl-shift-M), I see the text at first. The text is still there when I refresh with F5, but it disappears and goes blank if I ctrl-F5 (forced refresh). I think it might be loading the page from cache, but not if I force it to download new files.

If I open a “New Private Window” in firefox I see the same thing, which is a blank white screen and some html tags from SC (although “Buddha” only shows up 5 times, instead of 11). This does not show anything but a blank screen whether I use F5 or ctrl-F5.

Now I install google chrome on my computer and go to the link, and it’s also blank. Just html tags and a white screen. But when I check to see if google loads (it does), then click back, suddenly text is there! This text also disappears when I use ctrl-F5, and reappears with regular F5 refresh.

There is definitely something wonky going on, but I’m not familiar enough with how this site is loading data.

1 Like

I wonder if one of the big issues is crawability and the sitemap? I ran suttacentral.net through Semrush site audit, and it found 7 errors, which they define as critical. 1 error was that SC’s sitemap page returned a 4xx status code. Here’s what Semrush says about the error it found with the sitemap page:

Screenshot 2021-05-19 2.24.27 PM

Screenshot 2021-05-19 2.12.28 PM

1 Like

That’s a standard file not found (404) error, and would probably cause problems with google’s ability to crawl the site.

Google has a doc which describes how to build and submit a sitemap if someone wants to make one, and submit it.

1 Like

Thanks everyone for this feedback, i’ll hand this over to our developers and see if we can address the issue.

2 Likes

It sounds like sitemaps are especially important for large websites like SC:

What websites need an XML sitemap?

Google’s documentation says sitemaps are beneficial for “really large websites,” for “websites with large archives,” for “new websites with just a few external links to it,” and for “websites which use rich media content.”

While we agree that these kinds of websites will definitely benefit the most from having one, at Yoast, we think XML sitemaps are beneficial for every website. Every single website needs Google to be able to find the most important pages easily and to know when they’re last updated.

https://yoast.com/what-is-an-xml-sitemap-and-why-should-you-have-one/

The fact that SC has a legacy site and multiple versions of the same sutta–which is likely to confuse web crawlers–SC may even benefit more than typical large websites Google recommends sitemaps for.

We have in the past had a sitemap, I’ll have to check what the current status is.

1 Like

I do think that a sitemap is essential. However it is only part of the story. It helps Google crawl the site, but if it can’t read the content, then it’s not going to be able to do much with it.

One of the most obvious issues is present with us here in D&D, namely the inability of Discourse to find the info it needs to create the link cards. I think it is just scraping from the html that we can see when we do a view page source. But it’s the same for every page. This also ties in to the problem with relying on the Lighthouse SEO score. It tests to see if there is content in the tag. The fact that it finds identical information on all the pages doesn’t matter to the score.

Facebook also finds nothing but this default information when it scrapes the page.
image
This is for https://suttacentral.net/mn118/en/sujato

Also, when you try to use google translate to translate a whole page via link, it fails. Push to Kindle fails. Importing a url into a speed reader app fails.

It’s great that the site is so easy for sighted humans to access. But clearly there is a problem with machines being able to read the site.

2 Likes

Yes, it sure looks like that. Initially, I thought this was a SEO issue: that SC wasn’t ranking well on Google for sutta related searches. However, the problem appears much deeper than that: SC’s sutta pages don’t seem to show up in Google search results at all for many, if not most, of the common searches related to suttas.

That said, SC’s homepage does rank well for the simple search “sutta”, but the individual sutta pages don’t rank well, or at all, for common sutta search terms. So, there must be some significant difference between how the homepage and the other pages are set up.

Digging a little deeper, I see that javascipt is not working when the site is blank.

If I go to https://suttacentral.net/mn118/en/sujato, I am able to access javascript functions in the console. If I type the function, “removeSiteContent()”, it removes all the text, and leaves me with a blank page, just like I see when I hit ctrl-F5. When I type, “appendSiteContent()”, the text reappears like magic!

The problem is when I do a ctrl-F5 and start with a blank looking screen. Now those same functions do not work, even though the functions do show up in the html when I “view page source”.

I have no idea how the javascript functions could show in the source code, but when I try to use them it says “appendSiteContent() is not defined”, but that’s the situation.

This will stop google from being able to view the text on the pages. Google’s crawler will request a fresh copy of the files, just like ctrl-F5. So if ctrl-F5 shows a blank screen, that’s exactly what google’s crawler will see too.

3 Likes

Thanks everyone, we’re looking into it, unfortunately it’s not easy to see what will actually work.

Blake has adjusted some of our configuration, so when that goes live let’s see if anything is improved.

(it turns out there was some metadata that declared that the home page was the canonical source for all pages, we have fixed this, but it is unclear if it is the source of the problem.)

4 Likes

You could search on google for the results that come only from suttacentral.net by using the tag “site:suttacentral.net”.

For example for MN41, I would type in the searchbar: MN 41 site:suttacentral.net

2 Likes

The key to searching for things in SC (which took me a long time to find out for myself :grinning:)is to NOT use the SC search feature, but use Google and put the text ‘suttacentral’ into your search text.
E.g., when I enter ‘suttacentral MN 41’ into Google, the correct link comes up right at the top.
Ditto when I search for ‘suttacentral brahmins of sala’.
Hope this helps.

4 Likes