Commit Graph

984 Commits (fe917deb2d7668d59496053ff08f90d78a2b888b)

Author SHA1 Message Date
Michael Peter Christen 6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
10 years ago
Michael Peter Christen b44626e55b fixed target_alt_t in webgraph
10 years ago
Michael Peter Christen 504327b15c fix for condition for writing the webgraph
10 years ago
Michael Peter Christen 542c20a597 changed handling of crawl profile field crawlingIfOlder: this should be
10 years ago
Michael Peter Christen 4eec1a7452 refactoring (change Metadata name of load time data structure to avoid
10 years ago
reger f96cfdc84d prevent array out of bound exception on getRankingProfile(x)
10 years ago
reger a2cb366b25 Combine /heuristic search modifier with opensearch configured targets
10 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
10 years ago
Michael Peter Christen bf1b6b93e7 do not write CR values to webgraph if no CR values are computed
10 years ago
Michael Peter Christen d07cdd8c3b added SolrCloud access mode and configuration
10 years ago
Michael Peter Christen 8514bffc22 enhanced postprocessing status report
10 years ago
Michael Peter Christen b5fc2b63ea removed exist() retrieval functions from error cache and replaced it
11 years ago
Michael Peter Christen 62c72360ee cleanup of checkAcceptanceInitially in CrawlStacker, should avoid
11 years ago
Michael Peter Christen b5d78ba156 reduced number of solr queries during crawling
11 years ago
Michael Peter Christen fd87fa1613 removed more unnecessary exist-checks in ErrorCache
11 years ago
Michael Peter Christen f2b476e08b don't do a double check to solr for failed documents if they are not
11 years ago
orbiter dab9a0786a Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 51bf5c85b0 Renamed the transmission cloud to buffer in dispatcher since the name
11 years ago
Michael Peter Christen fb3dd56b02 fix for processing of noindex flag in http header
11 years ago
Michael Peter Christen b0d941626f fixed bugs in canonical, robots and title/description unique calculation
11 years ago
reger d9472d043a cleanup older unused classes
11 years ago
reger 665e12f88e move startup time from old serverCore to switchboard (most used here)
11 years ago
reger 336425912a remove unused localSearchThread from SearchEvent
11 years ago
Michael Peter Christen 1092e798a5 fixed double content postprocessing
11 years ago
orbiter 59160984cc timeline performance update
11 years ago
orbiter 2073e69034 fix for long periods in timeline
11 years ago
Michael Peter Christen 09dcdb9b19 update to solr 4.9.0
11 years ago
Michael Peter Christen 1cd4b2e8be Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 8c52f0651b refactoring of AccessTracker events & timeline fix
11 years ago
reger 431a5f9c4e added test case for TextSnippet,
11 years ago
Michael Peter Christen 5b94a257ce no timeout for large reference collections
11 years ago
Michael Peter Christen f5b817bac4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger a5707cd2eb enable proper Author navigator
11 years ago
Michael Peter Christen 74206a10c7 refactoring
11 years ago
orbiter fec673c9d1 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter c59da9fe7a added access tracker log reader stub
11 years ago
Michael Peter Christen 36e623d8bf enhanced metadata enrichment for media file type search:
11 years ago
Michael Peter Christen b893c42a0f bugfix for image search
11 years ago
orbiter 0bbb5040b8 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 9d5d86cd03 Added filter query options to the ranking servlet /RankingSolr_p.html.
11 years ago
Michael Peter Christen d2151857f1 Added collection navigation:
11 years ago
Michael Peter Christen 74c249288a added a push api to make it possible to upload files directly without
11 years ago
Michael Peter Christen ba6ffddefc refactoring
11 years ago
Michael Peter Christen 0c324d735c NPE fix for postprocessing without term index
11 years ago
Michael Peter Christen 922979aae1 added option to prefer http over https in unique-protocol ranking
11 years ago
Michael Peter Christen b3b174e2b8 fixed webgraph postprocessing and status display in Crawler_p servlet
11 years ago
Michael Peter Christen f23c4142e0 added option to configure a custom user agent within allip networks
11 years ago
Michael Peter Christen 8ad41a882c fixed several problems with postprocessing:
11 years ago
Michael Peter Christen ff5b3ac84d added new fields http_unique_b and www_unique_b which can be used for
11 years ago
Michael Peter Christen f0db501630 better handling of ranking parameters and new default values for date
11 years ago
Michael Peter Christen 53948da7d0 tried to make last_modified recognition smarter
11 years ago
Michael Peter Christen 6634b5b737 debug code for index distribution testing
11 years ago
orbiter 97983ba89f fixed generics warnings for generic array instantiation that appeared
11 years ago
sixcooler 830057d788 lower Segment-size (hope to get Segments of 10GB)
11 years ago
orbiter c028ae9b09 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
reger e31493e139 "Use remote proxy for yacy" has no function, remove option and related config item
11 years ago
orbiter 0d8072aa99 removed warnings
11 years ago
Michael Peter Christen a1ac4c3b76 automatically clear graphics cache
11 years ago
reger 1432a817dd respect "index media" switched off in CrawlStartExpert.html
11 years ago
Michael Peter Christen 4e734815e8 enhanced snippets: remove lines which are identical to the title and
11 years ago
Michael Peter Christen e84e07399a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger 8a7c68e4c7 content of surrogates/out never accessed (remove)
11 years ago
Michael Peter Christen 229f2248b8 added configuration option for maxmimum load and minimum ram for
11 years ago
orbiter 8e5ce7cd51 fixed a situation where finished crawls had not been detected.
11 years ago
orbiter ccb1864d55 catch IllegalArgumentException for wrong process types (that is needed
11 years ago
orbiter 4ee4ba1576 fix for NPE in IndexCreateParserErrors_p.html caused by bad handling of
11 years ago
reger 727dfb5875 refactore URIMetadataNode to further unify interaction with index
11 years ago
Michael Peter Christen 5746aae3db add canonical links to the same crawldepth, not the next crawldepth
11 years ago
Michael Peter Christen 74ab5ef9fa increased runtime for postprocessing query job
11 years ago
Michael Peter Christen 10cf8215bd added crawl depth for failed documents
11 years ago
Michael Peter Christen c2f62e783f - better subgraph handling, less overhead for crawls without the
11 years ago
Michael Peter Christen 9a5ab4e2c1 removed clickdepth_i field and related postprocessing. This information
11 years ago
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues:
11 years ago
Michael Peter Christen 075b6f9278 refactoring of the crawl balancer: the balancer is turned into an
11 years ago
Michael Peter Christen 8aeef73d49 fix for virtual root nodes
11 years ago
Michael Peter Christen 7c7fbb9818 find depth-matches also for edge targets
11 years ago
Michael Peter Christen dd12dd392f introduction of a data structure for HyperlinkEdges which should use
11 years ago
Michael Peter Christen 6ea8bb7348 using MultiProtocolURL for edge data which is faster (hash computation
11 years ago
Michael Peter Christen a37d067692 refactoring
11 years ago
orbiter 95780eed32 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
Michael Peter Christen 67beef657f strong redesign of html parser: object recursion is now made using a
11 years ago
Michael Peter Christen 6bd8c6f195 fix for wrong status codes of error pages
11 years ago
orbiter 67501c9dda Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
Michael Peter Christen 1c21b3256d fix for robots.txt handling: delete old entry before starting a new
11 years ago
orbiter c250fac9f4 linkstructure refactoring to get more options for clickdepth analysis
11 years ago
Michael Peter Christen 8068e68474 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen bd886054cb new structure and enhancements for link graph computation:
11 years ago
reger f326a67561 fix: typo in default charset in metadata2solr
11 years ago
Michael Peter Christen df138084c0 do solr optimization independently from memory and load constraints:
11 years ago
Michael Peter Christen ebd44a7080 replaced solr 4.6.1 with solr 4.7.1 and added index migration to
11 years ago
Michael Peter Christen 466d90ad42 fixed a problem with resource observer; probably coming from uncatched
11 years ago
Michael Peter Christen e8ddd415a8 enhanced the new link structure graph
11 years ago
Michael Peter Christen 926d28dd3f fixed a bug which prevented crawl starts after a network switch
11 years ago
Michael Peter Christen 3ce8eff21b another fix for inbound/outbound detection
11 years ago
orbiter 3c1274057d fixed thread dump in case of wrong seeds
11 years ago
orbiter 18f9c40302 moved Edge class out of linkstructure servlet as this does not work on
11 years ago
Michael Peter Christen c64c10ef00 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 48fbfa60c1 bugfix to inbound/outbound identification
11 years ago
reger 227c42bc96 eleminate obsolete URIMetaDataRow class
11 years ago
Michael Peter Christen cca851a417 introduced new solr field crawldepth_i which records the crawl depth of
11 years ago