    I would like a regular expression that brings back the top 10 most repeated words in a paragraph of text. But i also want the regular expression to ignore common words such as &#039;a&#039;,&#039;the&#039; and &#039;and&#039;.<BR><BR>why do i need this? well i&#039;m trying to create meta tag keywords for an article and want the most repeated words from article to be included in the meta tag to inprove search indexing with sites such a google etc.

    i don&#039;t think this is a job for regexp. consider this:<BR>1. insert each word into a database table<BR>2. delete your common word like &#039;a&#039;, &#039;the&#039;, etc..<BR>3. then run a query like this<BR>SELECT TOP 10<BR> COUNT(word) AS hits<BR> , word<BR>FROM word_table<BR>GROUP BY word<BR>ORDER BY hits desc<BR><BR>

    Simply strip out all the "junk" in the text file, converting all punctuation and new line characters and and and to, say, spaces. (This you could do with a regular expression quite nicely, of course.)<BR><BR>Now SPLIT the text on the spaces to create an array of words.<BR><BR>Then sort the array and count any multiple word occurrences, keeping track of the top 10 counts, only.<BR><BR>No DB needed, so probably faster.<BR><BR>

