smallseo.info

enlive

a selector-based (à la CSS) templating and transformation system for Clojure Home · cgrand/enlive Wiki · GitHub enlive - a selector-based (à la css) templating and transformation system for clojure

HTML extraction of MLA references with Enlive and Clojure

My objective is to extract and parse a series of bibliographical references from a webpage for entry into a database later. The references are all in MLA format. This should be a general solution, for all instances of MLA-format bibliographies, and should work on more than just the webpage indicated below.

Here is my attempt code, which doesn't work:

(use '[net.cgrand.enlive-html])

(def ^:dynamic *base-url* "https://www.impacttest.com/research/?Clinical-Research-Database-4")
(def ^:dynamic *ref-selector*     [:div#content_1 :ul :li])


(defn fetch-url [url]
  (html-resource (java.net.URL. url)))

(defn references []
  (select (fetch-url *base-url*) *ref-selector*))

(def ^:dynamic *ref-regex*    #"\s([A-Z]{1}[\w|\s]+)[,|\.]")
(def ^:dynamic *ref-modifier* `(remove :content))

(defmacro extract-re [node re modifier]
  `(doseq [seqs (map :content (node))]
    (re-find re (apply str (modifier seqs)))))

(extract-re references *ref-regex* *ref-modifier*)

(macroexpand-1 '(extract-re references *ref-regex* *ref-modifier*))

I would like the macro extract-re to create a doseq that runs a regex matcher (re-find) on all of the enlive nodes. There are two variables that need to change: one is the regex itself, and the other is the modifier, which modifies the enlive node before it's processed. Without the modifier, the regex will match both the authors and some titles. I tried writing a function, but couldn't get it to work in a general case, so I think a macro is the way to go.

On MLA references, I think it's easier to use the modifier on the enlive node than to do all of the extraction with regex, although I may be wrong on that. I can't think of how to do a regex that will only match the title or only the authors.

So, how do I pass the modifier to the macro and have it execute properly? I don't fully understand the quoting details of macros, so I may be way off on how I wrote the macro to begin with, or even if a macro is necessary.


Source: (StackOverflow)

Broken links when deploying Clojure webapps to Jetty with relative links and non-root context path

I've been experimenting with writing webapps in Clojure, and it's been pretty easy until now. I followed Chas Emerick's excellent screencast starting clojure and got an url shortener up and running pretty quickly. Next I wanted to be able to deploy it, and that's when the trouble started.

When I run it in development or deploy it to Jetty as the root webapp, everything is fine, but when I deploy it with a context path, it doesn't. Or, rather, it almost works. All my Compojure routes still work, but FORM action links in HTML files are broken and give me 404's.

This is the Compojure route setup:

  (defroutes app*
    (rt/resources "/")
    (GET "/" request (homepage request))
    (POST "/shorten" request
          (let [id (shorten (-> request :params :url))]
            (response/redirect "/")))
    (GET "/:id" [id] (redirect id)))

  (def app (compojure.handler/site app*))

And here is the HTML for the homepage template:

<!DOCTYPE html>
<html>
<head>
    <title>Insert title here</title>
    <link type="text/css" rel="stylesheet" rel='nofollow' href="site.css" />
</head>
<body>
    <form method="POST" action="shorten">
        <input type="text" name="url" />
        <input type="submit" value="Shorten!" />
    </form>
</body>
</html>

The problem is the action="shorten"URL. When deployed to Jetty with a context path of /example everything work fine, until I trigger the form submit. Then Jetty complains that it can't find localhost:8080/shorten which means (I think) that it's not being treated as a relative path, but an absolute one.

So, my question is: how to fix this? I guess I could just specify the full path in the action link, but that would be inflexible and make it harder to run the servlet in development. Is there a way to configure my way out of this? Or some magic URL prefix (like ~/ in Razor) that will just do the right thing?


Source: (StackOverflow)

How to generate a list of a pair of elements in Enlive?

I'm new to Enlive. I found that I can iterate with clone-for, however, it works for single element. I want to generate a list of a pair of elements like the following:

<div>
  <a rel='nofollow' href="url1">item 1</a><br>
  <a rel='nofollow' href="url2">item 2</a><br>
  ...
</div>

I tried to select <a> and use clone-for, but end with following result:

<div>
  <a rel='nofollow' href="url1">item 1</a><a rel='nofollow' href="url2">item 2</a>......<br>
</div>

What do I do to repeat <a> with <br> in each iteration?


Source: (StackOverflow)

Enlive - extract original HTML

Is it possible to retrieve the original HTML (with its quirks and formatting) using enlive selectors?

(def data "<div class=\"foo\"><p>some text <br> some more text</p></div>") 
(apply str 
    (enlive/emit* (enlive/select (enlive/html-snippet data) 
                                 [:.foo :> enlive/any-node])))

=> "<p>some text <br /> some more text</p>"

In this example, enlive has transformed the <br> tag into a self-closing tag, unlike the original input snippet.

I suspect that enlive is transforming it into a hiccup-like list of tags, such that the original information is unfortunately lost.


Source: (StackOverflow)

Enlive template auto-reload / detect changes in a Pedestal service

I am using the autoreload-server example which is working great for reloading namespaces on changes to the .clj files using ns-tracker.

https://github.com/pedestal/samples/blob/master/auto-reload-server/dev/dev.clj

However, it is not picking up changes to enlive templates in the resources/public dir. I've added my template paths to the vector in defn watch:

`([] (watch ["src" "resources" "resources/public" "public"]))`

As well as this in the namespaces that use enlive deftemplate:

(net.cgrand.reload/auto-reload *ns*)

However this does not work. My assumption is ns-tracker only works for clj files, and that I am using the enlive reload feature incorrectly.

Is anyone using enlive and have this figured out, or have any ideas to try?


Source: (StackOverflow)

How to scrape data from specified tag with Enlive?

could someone explain me how to scrape content from <td> tags where the <th> has content value (actually in this case I need content of <b> tag for matching operation) "Row1 title", but without scraping <th> tag (or any of its content) in process? Here is my test HTML:

<table class="table_class"> 
                    <tbody> 
                       <tr> 
                         <th>
                           <b>
                              Row1 title
                           </b>
                         </th> 
                         <td>2.660.784</td> 
                         <td>2.944.552</td> 
                         <td>Correct, has 3 td elements</td> 
                       </tr> 
                       <tr> 
                         <th>                                
                              Row2 title                                
                          </th> 
                         <td>2.660.784</td> 
                         <td>2.944.552</td> 
                         <td>Correct, has 3 td elements</td> 
                       </tr> 
                    </tbody>
</table>

Data which I want to extract should come from these tags:

                     <td>2.660.784</td> 
                     <td>2.944.552</td> 
                     <td>Correct, has 3 td elements</td> 

I have managed to create function which returns entire content of the table, but I would like to exclude the <th> node from result, and to return only data from <td> nodes, which content I can use for further parsing. Can anyone help me with this?


Source: (StackOverflow)

Clojure: How do you transform a lazyseq of map entries into a structmap?

I'm new to clojure and have been working with enlive to transform text nodes of html documents. My end goal is to convert the structure back into html, tags and all.

I'm currently able to take the structmap returned by enlive-html/html-resource and transform it back to html using

(apply str (html/emit* nodes))

where nodes is the structmap.

I'm also able to transform the structmap's :content text nodes as I wish. However, after transforming the content text nodes of the structmap, I end up with a lazyseq of MapEntries. I want to transform this back into a structmap so I can use emit* on it. This is a little tricky because the lazyseqs & structmaps are nested.

tldr:

How do I transform:

([:tag :html]
 [:attrs nil]
 [:content
  ("\n"
   ([:tag :head]
    [:attrs nil]
    [:content
     ("\n  "
      ([:tag :title] [:attrs nil] [:content ("Page Title")])
      "  \n")])
   "\n"
   ([:tag :body]
    [:attrs nil]
    [:content
     ("\n  "
      ([:tag :div]
       [:attrs {:id "wrap"}]
       [:content
        ("\n    "
         ([:tag :h1] [:attrs nil] [:content ("header")])
         "\n    "
         ([:tag :p] [:attrs nil] [:content ("some paragrah text")])
         "\n  ")])
      "\n")])
   "\n\n")])

into:

    {:tag :html,
 :attrs nil,
 :content
 ("\n"
  {:tag :head,
   :attrs nil,
   :content
   ("\n  " {:tag :title, :attrs nil, :content ("Page Title")} "  \n")}
  "\n"
  {:tag :body,
   :attrs nil,
   :content
   ("\n  "
    {:tag :div,
     :attrs {:id "wrap"},
     :content
     ("\n    "
      {:tag :h1, :attrs nil, :content ("header")}
      "\n    "
      {:tag :p, :attrs nil, :content ("some paragrah text")}
      "\n  ")}
    "\n")}
  "\n\n")}

Update

kotarak's response pointed me in the direction of update-in, which I was able to use to modify the map in place without transforming it to a sequence, thus rendering my question irrelevant.

(defn modify-or-go-deeper
  "If item is a map, updates its content, else if it's a string, modifies it"
  [item]
  (declare update-content)
  (cond
    (map? item) (update-content item)
    (string? item) (modify-text item)))

(defn update-content
  "Calls modify-or-go-deeper on each element of the :content sequence"
  [coll]
  (update-in coll [:content] (partial map modify-or-go-deeper)))

I was using for on the map before, but update-in is the way to go.


Source: (StackOverflow)

Clojure: partly change an attribute value in Enlive

I have this test.html file that contains:

<div class="clj-test class1 class2 col-sm-4 class3">content</div>

A want to define a template that changes only a part of an html attr value:

(deftemplate test "public/templates/test.html" []
  [:.clj-test] (enlive/set-attr :class (partly-change-attr #"col*" "col-sm-8")))

This would render:

...
<div class="clj-test class1 class2 col-sm-8 class3">content</div>
...

Thanks for your help!


Source: (StackOverflow)

Variable HTML template in Enlive

I'm trying to find an 'Enlivonic' way of defining a function that will perform a transformation on a parameterised html template.

In other words, how do I define something like a defsnippet that also takes the template as an argument?

I looked at at, snippet and transformation, but I'm a little lost in the macros :-(


Source: (StackOverflow)

how to reload the template when working with enlive without restar

I am using enlive for my web development. I start my ring server in repl using (serve my-app/handler)

However when I make changes to any of my html templates I have to restart my repl for the changes to show up. How do I reload my markup without restarting the repl ?

Thanks, Murtaza


Source: (StackOverflow)

combine multiple html fragment files with enlive, clojure

I have multiple html files, which is to be combined into a single html file. Those multiple files are like header, footer, etc, which are common to multiple files. I'm using enlive's html-resource method. but, that method inserting missing html tags into the final file, which I don't want.

Following is the output map,

({:tag :html, :attrs nil, :content (
 {:tag :head, :attrs nil, :content (
 {:tag :meta, :attrs {:content text/html; charset=utf-8, :http-equiv Content-Type}, :content ()} 
 {:tag :title, :attrs nil, :content (HewaniLife | Changing The Way You Live)} 
 {:tag :link, :attrs {:href styles/main.css, :rel stylesheet, :type text/css}, :content ()} )} 

 {:tag :body, :attrs nil, :content (
 {:tag :html, :attrs nil, :content ({:tag :body, :attrs nil, :content ({:tag :div, :attrs {:id header}, :content (
 {:tag :h1, :attrs nil, :content ({:tag :a, :attrs {:href index.xhtml, :id logo}, :content (
 {:tag :span, :attrs {:class img-replace}, :content (hewaniLife)})})} 

 {:tag :div, :attrs {:id main-nav}, :content (
 {:tag :ul, :attrs nil, :content (
 {:tag :li, :attrs nil, :content ({:tag :a, :attrs {:href login.xhtml, :id btn-login}, :content (
 {:tag :span, :attrs {:class img-replace}, :content (Login)})})} 
 {:tag :li, :attrs nil, :content ({:tag :a, :attrs {:href index.xhtml, :id btn-home}, :content (
 {:tag :span, :attrs {:class img-replace}, :content (Home)})})} 
 {:tag :li, :attrs nil, :content ({:tag :a, :attrs {:href search.xhtml, :id btn-search}, :content (
 {:tag :span, :attrs {:class img-replace}, :content (Search)})})})})} 
 {:type :comment, :data  end of div#main-nav } 
 {:tag :br, :attrs {:class clear-all}, :content nil})} {:type :comment, :data  end of div#header })})})})}

Here, you can see the html tags nested when I insert the files.

Is there are any way to insert these files..?

Can anybody used any other methods..?


Source: (StackOverflow)

Append to an attribute in Enlive

Is it possible to append a value to an attribute using enlive?

example: I have this

<a rel='nofollow' href="/item/edit/">edit</a>

and would like this

<a rel='nofollow' href="/item/edit/123">edit</a>

I am currently doing this:

(html/defsnippet foo "views/foo.html" [:#main]
  [ctxt]
  [:a] (html/set-attr :href (str "/item/edit/" (ctxt :id))))

But I would prefer not to embed the URL into my code, by just appending the id to the existing URL

(html/defsnippet foo "views/foo.html" [:#main]
  [ctxt]
  [:a@href] (html/append (ctxt :id)))

Source: (StackOverflow)

Extracting consecutive html fragments with enlive

I need to scrape html that has the following form:

<div id='content'>
    <h3>Headline1</h3>
    <div>Text1</div>
    <div>Text2</div>
    <div>Text3</div>
    <h3>Headline2</h3>
    <div>Text4</div>
    <div>Text5</div>
    <h3>Headline3</h3>
    <div>Text6</div>
    <div>... and so on ...</div>
</div>

I need to get the content between the headline tags as separate chunks. So from one headline up to the next. Unfortunately there is no container tag for the desired ranges.

I tried the fragment selector {[:h3] [:h3]} but somehow this only returns all h3 tags, without the tags in between them: (({:tag :h3, :attrs nil, :content ("Headline1")}) ({:tag :h3, :attrs nil, :content ("Headline2")}) ({:tag :h3, :attrs nil, :content ("Headline3")}))

What does work, is {[[:h3 (html/nth-of-type 1)]] [[:h3 (html/nth-of-type 2)]]}. This gives me all of the html between the first and second h3-tag. However this does not give me all of the desired chunks with one selector.

Can enlive do this at all or should I resort to a regular expression?

Thanks!


Source: (StackOverflow)

clojure, enlive, multi-site

Trying to load a particular template based on what :server-name returns in the request:

(ns rosay.views.common
  (:use noir.core)
  (:require [noir.request :as req]
            [clojure.string :as string]
            [net.cgrand.enlive-html :as html]))

(defn get-server-name
  "Pulls servername for template definition"
  []
  (or (:server-name (req/ring-request)) "localhost"))

(defn get-template
  "Grabs template name for current server"
  [tmpl]
  (string/join "" (concat [(get-server-name) tmpl])))

(html/deftemplate base (get-template "/base.html")
  []
  [:p] (html/content (get-template "/base.html")))

It works for localhost which returns /home/usr/rosay/resources/localhost/base.html, but when I test against a different host say "hostname2" I see where get-template is looking at /home/usr/rosay/resources/hostname2/base.html but when it renders in the browser it always points back to ../resources/localhost/base.html.

Is there a macro or different way to handle this use-case?


Source: (StackOverflow)

Binding a local var in deftemplate for enlive

I'm brand new to clojure and the web development stack. I'm trying to use enlive to set values in an HTML template:

(en/deftemplate project-main-page
  (en/xml-resource "project-main.html")
  [id]
  [:#project-name] (en/content (str "Name: " ((get-project id) :name)))
  [:#project-desc] (en/content (str "Desc: " ((get-project id) :desc))))

This works fine to set my two HTML elements, but it involves a repeated call to my function get-project. At the moment this just reads from a local map, but eventually it will involve some external storage access, so I'd prefer to just perform it once in this function.

I was thinking of using let:

(en/deftemplate project-main-page
  (en/xml-resource "project-main.html")
  [id]
  (let [project (get-project id)]
    [:#project-name] (en/content (str "Name: " (project :name)))
    [:#project-desc] (en/content (str "Desc: " (project :desc)))))

But this only affects the description element and ignores the name forms.

What is the best way to bind a local var within deftemplate?


Source: (StackOverflow)