Web Clipping

sgwillett
sgwillett's picture
User offline. Last seen 4 weeks 7 hours ago. Offline
Joined: 11/26/2008
Points: 32

Can you provide me a sample of a emml that does web clipping. The current samples are all google based and clip the query out.. I want to be able to clip out a section of the body of a document. It is several tables without ids. Thanks

0
Your rating: None
smitchell
smitchell's picture
User offline. Last seen 7 hours 37 min ago. Offline
Joined: 08/29/2008
Points: 34

I don't have a sample, but I do have a couple of suggestions. Do the tables you want have a class name?

If so, you can still use <directinvoke> in a mashup and then filter the result by the class name value. If necessary, you can use a separate <directinvoke> for each table.

If there really isn't any way to simply filter out the content, then consider using the Dapper Connector. You would need to set up an account for Dapper. Then use the connector to define a service and publish it in Presto. Dapper allows you to select content directly from a view of the page, so you should be able to select just the tables you are interested in.

Sara, technical writer/jackbe

 

polly
User offline. Last seen 2 years 7 weeks ago. Offline
Joined: 02/12/2009
Points: 150

If you do not have class names for the tables, I think you can iterate to the table(s) you want by addressing the array entry.

<assign fromexpr="$result//xhtml:table[1]"  outputvariable="myfirsttable" />

<assign fromexpr="$result//xhtml:table[8]"  outputvariable="mysecondtable" />

 

apolenur
apolenur's picture
User offline. Last seen 2 days 23 hours ago. Offline
Joined: 09/22/2008
Points: 2

Following is a mashup which clips tracking information from FedEx web site.

It is a little dated, so might not work if FedEx changed format of their page. Hopefully they have not. In any case should give you a starting point.

<mashup name="FedexTrackingService" xmlns="http://www.jackbe.com/2007-04-10/JMMLSchema"
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                    xsi:schemaLocation="http://www.jackbe.com/2007-04-10/JMMLSchema ../schema/JMMLSpec.xsd">
   <operation name="track">
   <inputparam name="fedexTrackingNumber" type="string"/>
   <outputparam name="result" type="document">
           <![CDATA[
               <FedExTrackingInfo/>
           ]]>
   </outputparam>
       <variables>
           <variable name="searchresult" type="document"/>
           <variable name="fedexUrl" type="string"
                default="http://www.fedex.com/Tracking?ascend_header=1&amp;clienttype=dotcom&amp;cntry_code=us&amp;language=english&amp;tracknumbers=" />

       <variable name="activity" type="string" default="" />
          
       </variables>

     <namespaces>
         <namespace prefix="xhtml" uri="http://www.w3.org/1999/xhtml"></namespace>
     </namespaces>

     <assign outputvariable="fedexUrl" fromexpr="concat($fedexUrl,$fedexTrackingNumber)"/>

     <display message="fedexUrl...." expr="$fedexUrl"/>

     <externalinvoke outputvariable = "searchresult" endpoint="$fedexUrl"/>

       <display message="Destination...." expr="$searchresult//xhtml:b[. = 'Destination']/../../xhtml:td[3]/string()"/>

       <display message="Tracking number...." expr="$searchresult//xhtml:b[. = 'Tracking number']/../../xhtml:td[3]/string()"/>

       <display message="Ship date...." expr="$searchresult//xhtml:b[. = 'Ship date']/../../xhtml:td[3]/string()"/>
       <display message="Status...." expr="$searchresult//xhtml:b[. = 'Status']/../../xhtml:td[3]/string()"/>

       <appendresult outputvariable="result">
           <![CDATA[
                   <trackingNumber>{$searchresult//xhtml:b[. = 'Tracking number']/../../xhtml:td[3]/string()}</trackingNumber>
 
             ]]>
       </appendresult>

           <appendresult outputvariable="result">
           <![CDATA[
                   <shipDate>{$searchresult//xhtml:b[. = 'Ship date']/../../xhtml:td[3]/string()}</shipDate>
 
             ]]>
       </appendresult>

        <appendresult outputvariable="result">
           <![CDATA[
                   <deliveryDate>{$searchresult//xhtml:b[. = 'Delivery date']/../../xhtml:td[3]/string()}</deliveryDate>
 
             ]]>
       </appendresult>

    <appendresult outputvariable="result">
           <![CDATA[
                   <status>{$searchresult//xhtml:b[. = 'Status']/../../xhtml:td[3]/string()}</status>
 
             ]]>
       </appendresult>

       <foreach variable="transit" items="$searchresult//xhtml:td[@class='subheaderwhite1'][. = 'Activity']/../..//xhtml:tr[position() = 3 to 11]">

           <assign outputvariable="activity" fromexpr="$transit//xhtml:td[7]/child::text()"/>

            <if condition="string-length(normalize-space($activity)) = 0 ">
                <assign outputvariable="activity" fromexpr="$transit//xhtml:td[7]/xhtml:b/string()"/>
            </if>

           <appendresult outputvariable="result">
                <![CDATA[
                   <transit>
                   <date>{$transit//xhtml:td[2]/string()}</date>
                   <time>{$transit//xhtml:td[3]/string()}</time>
                   <activity>{$activity}</activity>
                   <city>{$transit//xhtml:td[11]/string()}</city>
                   </transit>
                ]]>
           </appendresult>
       </foreach>        
    </operation>
</mashup>

polly
User offline. Last seen 2 years 7 weeks ago. Offline
Joined: 02/12/2009
Points: 150

Alexi's sample looks like it is showing a similar thing.

  <time>{$transit//xhtml:td[3]/string()}</time> 

You address the cell ( in this case the table column you want, but like my example said, show the table you want )