Commit Graph

209 Commits (df2bf9ef2828e0e2d2d0eb02ed84dce04f9b6fc7)

Author SHA1 Message Date
luccioman d16bc99835 Added "Show Metadata" links to the ViewFile.html links mode
6 years ago
luccioman e115e57cc7 Reduced text snippet extraction processing time.
7 years ago
luccioman 90dc580158 Fixed initial ViewFile mode and suggestions links from previous commit
7 years ago
luccioman 0b6aed4de6 Keep the selected view mode when typing a new URL in the ViewFile page
7 years ago
luccioman 8100c033a2 URL Viewer : apply crawler size limits when adding to local index.
7 years ago
luccioman 1b3c169a9c URL Viewer : decode raw text using the eventual response charset.
7 years ago
reger c77e43a391 Take out mailto collect in internal parsed document
8 years ago
luccioman 5b5b9d5d96 URL Viewer : only display the link to metadata when metadata exists
8 years ago
luc 9f712146df Display icons in ViewFile "links" mode.
9 years ago
reger dec3e6ad96 fix: adjust urlstub for mailto links
9 years ago
reger 0c5548a7ff fix (todo) remove redundant holding of email link nameproperty in parser document
9 years ago
reger 71c416f383 show mailto links in ViewFile.html linklist
9 years ago
reger 1160b13172 remove unused md5 from ViewFile servlet params
9 years ago
Michael Peter Christen dbbad23e12 removed warnings
9 years ago
reger 821262a179 add CommonPattern for multiple spaces
10 years ago
reger 17e820cfd7 use doctype() in ViewFile to choose display routines
10 years ago
reger 66d0b5046a fix NPE on viewfile of url not in index
10 years ago
Michael Peter Christen 68c605d637 replace with CommonPattern.SPACE for split
10 years ago
Michael Peter Christen 69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
10 years ago
reger 198102304b refactor size() -> filesize() of URIMetadataNode
10 years ago
reger ff18129def ViewFile servlet: update index if newer,
10 years ago
Michael Peter Christen 0a879c98e7 added new 'firstSeen' database table and necessary data structures which
10 years ago
reger 54019313e7 fix NPE in ViewFile - show snippet
10 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
10 years ago
Michael Peter Christen c115f3869c enhanced snippet computation and test method in ViewFile
10 years ago
Michael Peter Christen 6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
10 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
10 years ago
Michael Peter Christen 6e59ca4ebf removed jena library and all code that depended on jena. When jena was
11 years ago
Michael Peter Christen 9bb7eab389 hacks to prevent storage of data longer than necessary during search and
11 years ago
Michael Peter Christen 61c5e40687 - replaced the properties object in AnchorURL with distinct variables
11 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
11 years ago
Michael Peter Christen 5d71a4c8bc fix for dc:description field
11 years ago
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
11 years ago
Michael Peter Christen 76afcccaaf fix for default boolean post values: the default value MUST NOT be TRUE,
11 years ago
Michael Peter Christen 4c242f9af9 always use a default value for boolean options to have transparency for
11 years ago
Michael Peter Christen bcc623a843 refactoring of load_delay: this is a matter of client identification
12 years ago
Michael Peter Christen 16d1d744fa added url_file_name_s in default collection schema for the file name
12 years ago
Michael Peter Christen 0600d510e1 show the citation report also in ViewFile
12 years ago
Michael Peter Christen 1a92b61d69 fixed usage of ViewFile which needs a commit before showing latest crawl
12 years ago
Michael Peter Christen 8f2d3ce2f9 reduced locking situation in crawler: shifted synchronized location and
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
Michael Peter Christen 16d90859b7 reverted put-semantics back to as-usual in serverObjects and introduced
12 years ago
Michael Peter Christen 43f3345c90 - removed dependencies from URIMetadataRow and made direct access to
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 554db5608b fix for ViewFile
12 years ago
Michael Peter Christen 1533bfd63b refactoring
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago
Michael Peter Christen d8425e6809 added collections to crawl monitor
12 years ago
Michael Peter Christen 0cab06c47c refactoring
12 years ago
Michael Peter Christen 18f989dfb1 - refactoring (load -> getMetadata)
12 years ago