Parsing Words From Multiple Recordsets - Please Help

Results 1 to 8 of 8

Thread: Parsing Words From Multiple Recordsets - Please Help

  1. #1
    Join Date
    Oct 2010
    Posts
    4

    Question Parsing Words From Multiple Recordsets - Please Help

    I’ve had too many sleepless nights trying to work this out – I hope I can get some help here.

    I have three distinct ADO recordsets that I need to parse words from and return to the client with sorted full word counts (less a predefined dictionary of innocuous words e.g. a, an, the, of…) across all three, references to the original record and references to the original source. The user needs to be able to drill in and out of this result set.

    I’m going to simplify the layout for clarification purposes.
    Let’s say each data source has three identical fields ‘updatedTime’ (DateTime), ‘text’(VARCHAR(255)) and ‘ID’.

    User goes to the site and searches for the word ‘frog’.
    User is returned with a word count of every record that has the word ‘frog’ in the ‘text’ field sorted by the top count down.
    ‘green’ 18 times in table A, 12 times in table B, 22 times in table C
    ‘hop’ 16 times in table A, 13 times in table B, 6 times in table C.
    ‘roadkill’ 12, 11, 4
    ‘red’ 10, 12, 3
    And so on…
    User clicks on the word ‘green’.
    Now only words with both ‘frog’ AND ‘green’ in the original record show up (again, with the counts and sorted by count).
    A breadcrumb is built, so the user can click on ‘frog’ again to kill that filter.
    User decides he doesn’t want the default timeframe (6 months) and adjusts a filter to show only records updated in the past week.
    User drills down 6 levels (the breadcrumb is updated along the way – so he can back out to any point).
    At this sixth level, he sees that only 4 entries have the words ‘frog’, ‘green’, ‘hop’, ‘legs’, ‘funny’ and ‘story’ in them and clicks on the “Results” button to show all four records in a table.

    That’s the gist of it.
    Two approaches immediately come to mind, and I’m not only unsure which would be better (more efficient and faster) but I can’t help but think I may be going about it all wrong and there is something I’m not seeing that would be much better than either option.

    Approach 1:
    Server Side:
    Combine the three recordsets into one big one, and add a fourth field that will indicate the source recordset. This will be serialized and sent to the client.

    Step through this master recordset, one record at a time, and parse the ‘text’ field in a For… Loop (using a Select… Case to ignore the dictionary of innocuous words) to build a connectionless recordset with total counts for each word. Sort this recordset by wordcount. This will be serialized and sent to the client.

    Also, in this same For… Loop, create an object or array for each word with values that point to the original record ‘ID’ (alternatively, create a master object with an attached array for each of the words). Again, this will be serialized and sent to the client.

    Client Side:
    Rebuild the two connectionless recordsets and the word objects (or master object) to create the page.
    The word list that will be displayed will be built with the wordcount recordset.
    The separate word counts for each word will be built using a client side For… Loop that will count the source references (A, B and C) in the master recordset by cross referencing the ‘ID’ from the master object (or word objects).
    When the user drills down by clicking on the word, a filter will be applied to the wordcount, to filter out any of the records which do not have that word in it.
    When the user clicks on the “Display” button, the remaining records will be referenced against the master object (using the ‘ID’) to find the records in the master recordset to display.

    Approach 2:
    Similar to approach 1, but rather than building the word objects (or master object) when the user clicks on a word, a find operation will run against the master recordset on the client side to build a filter for the word list.

    No matter how I look at this, it seems far messier and less efficient than it should be, but I can’t seem to find the right approach.
    Can anyone give me some guidance on this?

    Much appreciated.

  2. #2
    Join Date
    Dec 1969
    Posts
    95,999

    Default

    Big question:
    Server Side:
    Combine the three recordsets into one big one, and add a fourth field that will indicate the source recordset. This will be serialized and sent to the client.
    How much data is that??? Dozens of records, total? Or hundreds? Or thousands? Or?

    Seems to me that if you take that approach then there is nothing else that could or should be done on the server. Oh, you could build the word list on the server, but why? All the data is now on the client, so why not just do it all on the client?

    Not saying it's going to be easy, no matter what. But...

  3. #3
    Join Date
    Oct 2010
    Posts
    4

    Default

    Well, the question, I guess, is whether it would be more efficient to manipulate and sort the recordsets on the server or the client.

    I realize that the server response time will depend on the number of concurrent connections and the amount of data that's returned in their queries - but, as a baseline, if there is one user attached, which would give the user a better experience?
    If it is faster on the server, one for one, how do I find that tipping point - i.e. if there are X users, the response will be better if the data was manipulated on the client side?

    Depending on the search parameters, it could be several thousands or records - though I will probably put some kind of limiter in there to pare it down to the *top 1000, or something.
    The potential size of the recodsets is a big part of what made think processing the data at the server might be beneficial.

  4. #4
    Join Date
    Dec 1969
    Posts
    95,999

    Default

    1000 records times 1000 bytes per record: 1mb. That's high, but not horribly unreasonable. I think I'd take the time to make some experiments to see if JS is up to the task.

    I'd send 3 separate sets to JS, one per table, and then let JS do the "merge".


  5. #5
    Join Date
    Oct 2010
    Posts
    4

    Default

    I think you’re right, if I understand you correctly.
    I think I’m focusing too much on having pre-processed data on hand. I should just stream the core data to the client and process it there.

    I'm leaning towards handing two recordsets off to the client:

    Master recordset: This would be the core data from the three feeds, along with a reference to which feed it came from:
    ID | text | updateTime | source

    wordList recordset: This would be the recordset used to populate the display:
    Word | totalCount | Count1 (number of times the word shows up in source1) | Count2 | Count3

    (or just hand off Master and build wordList on the client)

    User clicks on a word in the display to drill down.
    Filter Function:
    1.) Apply a filter to wordList to remove this word from the display and redraw.
    2.) Use a find operation against the text field in Master to apply a filter to remove all entries that do not have that word in it.
    3.) Display a count of how many records are left in Master.

    4.) When the user drills down to the level where (s)he thinks the number of results is a manageable display and clicks the “Display” button. Dump Master to a table display.

    This leaves me with two questions:

    In step 2:
    Would it be better to apply recursive filters to Master and wordList in order to pare down the size of the resultsets and ultimately use Master as the display source in step 4, or should I leave these two core recordsets intact and build an additional recordset/object/array for each and update/rebuild the secondary sources as the user drills down?

    If recursive filters would be better, forgive me but I’m a bit rusty with recordset filters, how would that work? If I have a filtered recordset, can I apply a second filter to filter it down further – or do I have to build a running list to reapply at each drill level? In other words:
    User clicks Green – I apply the filter function for the word “Green”.
    User then clicks Hop. Would I just apply the filter function for the word “Hop” now to filter the filtered recordsets, or do I have to clear the current filter and apply a new comprehensive filter for “‘Green’ OR ‘Hop’”?

    I appreciate the feedback.

  6. #6
    Join Date
    Dec 1969
    Posts
    95,999

    Default

    I would not use ADODB.Recordset in the client JS code.

    Just create your own JS-only based filtering. It's not like you need anything exotic. Just really simple filtering.

  7. #7
    Join Date
    Oct 2010
    Posts
    4

    Default

    Quote Originally Posted by Bill Wilkinson View Post
    I would not use ADODB.Recordset in the client JS code.
    Why is that?
    I've read some articles that claim connectionless recordsets are very efficient when you want to do filtering, sorting and other recordset operations on the client side. Sorting is often used as an example, because the recordset sort operation is a lot less overhead hungry than the recursive rewrites it would require with other objects.

  8. #8
    Join Date
    Dec 1969
    Posts
    95,999

    Default

    Well, just for starters, it means (a) your code will only work in MSIE browsers, (b) your users will have to mark your site as "Trusted", and (c) even so they will have to grant permission to use "unsafe" scripting each time they use the page.

    If this is only for use on an inTRAnet site, then all of that is probably okay.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


More ASP Resources

Resources:
-- ASP Articles
-- ASP.NET Information
-- 4Guys ASP F.A.Q.
-- ASPFAQs.com
-- ASP Coding Tips
-- Related Web Technologies
-- User Tips!!
-- JavaScripts


Development Centers
-- HTML5 Development Center
-- Windows Mobile Development Center
-- Android Development Center
-- Cloud Development Project Center


Development Technology
Check out these sites for more articles, tutorials, FAQs, discussions, and code!
-- CodeGuru.com
-- VBForums.com
-- DevX.com
-- Developer.com
-- HTMLGoodies