Posted by: Diabolic Preacher | September 2, 2006

Searching Blogs…Precisely

Finally search engines are putting in more efforts to ease the location and communication of blogs in the blogosphere and in representing the apparently chaotic discussions in an orderly manner…given that the user searches specifically in the following ways

It’s been a long time that I wished there was some easy way to search through my journal as i often posted a lot of useful (ok ok useful to me) information that i often need to refer to when i need to check if i have already posted stuff on that issue for referencing in newer articles or to help my friends use that information without going thru the 250 and counting entries manually. The options available to search livejournal which does not allow external code for basic accounts has somewhat increased by a large amount (or atleast gradually i have discovered more and more) but the ways of pinpointing information is very very few. The reason I’m focussing on LiveJournal searching is because LJ posts have a numerical id for the permalink html, such as 66394.html. This becomes meaningless when you want to search for terms relevant to the post in the url itself. Let’s look at some of the options that developed so far:

  1. LJSeek: LiveJournal initially didn’t have a proper full fledged search engine specially crafted to locate journals and communities within it and to search for content within those journals filtered by any combination of filters. It gave a link to LJSeek and said “we believe they know a little bit about searching…we don’t know much about others”. In fact at that time there literally weren’t many others at that time. Blog update checkers like icerocket, feedburner and weblogs.com are big screwups even today. LJSeek at that time was so crude compared to what it has tried to achieve today, you would have congratulated them if they could locate the journal given you typed the username correctly. It was not at all useful and i often ended up typing out a lot of stuff again that i had typed before. will explain this re-typin thing later on.
  2. Blog Update Notifiers: You can check a list of these services at Pingoat. Once the page loads up just scroll down and you can see a list of services…in no way complete as they are always on the rise. Now these sites index your blog on a regular basis, which is kinda occassional and is not so useful if you post time-critical information (e.g. gmail accounts have been hacked with just the username!!), so what you need to do is poke these services or in blog terminology ‘ping’ these services to send their spiders/bots/crawlers/indexers to visit your site and read the syndicated feed to get the article in part or in full and index the content under various criteria. Initially LJ didn’t have it’s own tag system and since the template wasn’t editable, there was no way to generate tag links to generic services like technorati or for that matter any blog search service that indexed blogs based on the tags or keywords the entries had.
    • Technorati: This service is being mentioned separately as this is one of the more popular ones especially among bloggers or those searching for news and related content in the blogosphere. It also has a ranking system which helps you choose the blogs which more people refer to or link to and search for more authentic content that is referenced to by a maximum number of relevant sources or that quotes a large number of relevant sources. Currently Technorati indexes around 52.7 million blogs. Technorati is one service geared more towards searching for content than for locating your friends’ blogs. Check out their discover section. Technorati was however one of the few services that allowed searching within a journal…provided you could embed a code snippet within the template of your blog, so that technorati could verify that the blog is owned by you and let your visitors search your blog(s) via the technorati service. In the initial stages, technorati didn’t consider the entire livejournal population or for that matter probably all the blogs on all other services that relied on the livejournal code and livejournal’s policy of template editing restrictions. Livejournal being one of the services after blogspot that got quickly popular inspite of being built on open source tools was rapidly rising in popularity and had a lot of content stored in it waiting to be presented in an orderly fashion. At present LJ has 11,047,229 journals and communities and fortunately (it might have been a year or so) technorati has a special provision for a livejournal user to log into his/her livejournal account (after he/she has created a technorati account) and claim the blog using the quick claim mode…so far i’ve never seen it working…who knows where the username and password goes! 🙂 The other method, the one that works is by posting the claim code (a simple hyperlink with a claim key at the end of the url) as a simple journal entry such as this one. This worked for me when i tried it somewhere around last year and it worked for my friend prashob whom i added to technorati yesterday. So finally I had some sort of search engine that indexed my entries in a meaningful way…i.e. i could somewhat help users locate information already posted to help them and i could prevent redundancy of content to a larger extent than before. However unlike services which allowed editing of templates you cannot embed a search box within a livejournal blog but can post a link or add a link to the links list to the search page for your livejournal blog…basically the profile page of your livejournal blog. this page contains the tags you use most frequently.
      pause a bit: thinkin where the hell tags came from? well technorati it seems kinda started understanding the livejournal system much better and much more than it even bothered to know about the blogosphere breeding within. Livejournal had introduced a tagging system for it’s members where they could add a list of tags (separated by commas) in a separate field. Technorati uses this to the livejournal user’s advantage and eliminates the boring requirement of manually typing in the specially crafted links to technorati pages corresponding to the tags you mark your post with. One thing that LiveJournal could do is to use some ajax to recall the list of tags already used by the user in live auto-complete mode like simpy and del.ico.us. I suppose one reason they don’t do this is because there is an option to compose an entry without logging in and log in only if you need to post the stuff. So all those who need to try out little bits and pieces of html code or check out how a html formatted post will look when published live…you could use this link and click on preview…you could also do spell check…now that’s something others won’t let you have without an account…or do they?
      …Coming back to the blog profile page on technorati, the other parts are list of inbound and outbound links, few of the latest entries that technorati has indexed from your blog…the sooner you ping the earlier your latest content gets indexed. it also shows your posting frequency in a bar chart, gives your blog a ranking depending on how many blogs link to you and how many links in all are there from those blogs. Last but not the least, they give a search box to search the journal/blog as well. My LJ blog’s profile page is here and the list of inbound links are listed over here.
  3. Google BlogSearch: Been waiting for this for a long long time. Google is one company that comes, sees and conquers…mid-way of any established competition scene related to search. If you had been aware, Google let out before it came out with it’s blog specific search index/engine, that it was indexing as much of the blogosphere as it could…starting from blogger/blog*spot which it bought just a li’l while before that from Pyra Labs. It also gave the owner of a blog a way to opt out of the index…that is not make his/her blog searchable through the google blogsearch engine. Dunno if that option still exists. I for one am really happy that finally Google did a great job at making my journal more useful than ever before. Surprisingly Google indexed my journal and searched content within it so easily despite all the stupid waah waah’s the other indexing services did coz livejournal didn’t allow editing of the template or include external scripts (which it doesn’t in order to reduce the amount of harm that can be caused by malicious code). Google really lives upto it’s name as you can literally pinpoint content within a blog or find blogs having the content/phrase you are searching for. There remained no necessity for bloggers to classify their blog posts (which nobody usually cares to), which is really not a good habit especially on livejournal or whichever blogging system allows you to have tags or categories since it allows you to filter your single blog as various sub blogs and share with different people with different interests. Initially I’d figured out a crude way of searching content in my journal and so far nearly every time it has returned highly accurate results. I prefixed the search term with the word pintooo15 which is my username and is somewhat unique (perhaps as is the requirement of majority/all services where you can create an account). So it’d be “pintooo15 <search term(s)>”. But what if there were users on other services who had accounts with the same username and posted some other unrelated crap. This is where a special keyword called blogurl comes into play.
    • blogurl keyword: Experts in Google searching, especially page-lifters (plagiarists) of most academic institutions would know/understand the working of these special operator keywords that a search engine provides to do some advanced searching. blogurl helps in searching a journal and does a better job than technorati which confuses it’s classification of my blog and leaves my blog outdated for over 15 days (although their efforts and time taken to rectify the problem of their user’s blog listing within a reasonable enough period for a free service). An example of it’s usage is as follows (with some setting up of the scene :D) : A friend during a chat session was talkin about having read my blog and liked the post on ubuntuzation of the internet centre of goa university. the only way that i could quickly locate the permalink to the post in my journal to take a quick look at it is by using the only method that i have learnt of till now…prefix pintooo15 to the keyword “ubuntu” (in this example case). Sure enough I got the results and got to see a special link under the first result saying something like “more results from this blog” or something like that. Clicking this url is how i had got to learn about the blogurl keyword. So I could have written the query as “ubuntu blogurl:http://pintooo15.livejournal.com&#8221; (without the quotes) and that would have made sure that i was searching exclusively in my own journal only. You could search for content in your blog for the following reasons/needs:
      • Your friend asks for some information about some stuff you think you might have blogged about
      • You are reviewing a newer version of a product or writing some advanced stuff that needs the user to have some basic knowledge which you have provided in an earlier post
      • for the non-tech people: you are writing a follow up to an earlier incident. let’s say you are following a news story. you’d like to link back with some text that’d go like “…as mentioned earlier over here…”. news story need not be national news, it cud be your own stuff as well.
      • People might use the search technique to find useful content from your blog and link to it while writing content on the same/similar topic.
      • For your reading pleasure 🙂

Damn! i just can’t help reduce my blog post length. thought of writing just one line “blogurl rocks!” but look what i’ve done! over 12,400 characters! someone said ‘tech guys go yadda yadda yadda’ and i was like ‘i keep silent and yadda yadda yadda on my blog and let it bug the voluntary visitors 😀 ” i just let out all the details that flow from the mind as i brainstorm with myself over that certain topic and certain times the mind overflows a bit…sometimes. however i have somehow improved from one post for all purposes rather than topic based posts. don’t the regulars think so?


Responses

  1. bloglines…how cud i hav forgotten that

    well bloglines being a service that i most regularly use to keep track of updates on my friend’s blogs and news sites had recently introduced a search feature which is available to all visitors besides members. members additionally can search only thru the feeds they subscribe to.

    the link to the search page is this

  2. Reconsider 4 souvenir

    compress


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Categories

%d bloggers like this: