Commit Graph

7399 Commits (509eba24849300c27c7f754dbe15049d25757582)

Author SHA1 Message Date
reger a6891ff7f8 fix Querygoal.parse exception on +/-null-term
11 years ago
reger c7335318eb remove unused legacy procedure from httpserver
11 years ago
Michael Peter Christen eab0d3e1a9 bugfix for wrong lock display, see
11 years ago
orbiter 49d4f95faf bugfix to latest commit
11 years ago
orbiter 68211f8244 enable Crawler_p servlet if a rss feed or a wiki dump import was
11 years ago
orbiter a65df4ce7e do not push noindex errors into log if in intranet mode. noindex
11 years ago
orbiter 688c6d8954 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 4ae7aead28 addon to latest fix
11 years ago
Marc Nause 2af56fa37d Improved UPnP. (still not perfect)
11 years ago
orbiter b3ebd38079 removed the HTDOCS repository concept because the concept to host files
11 years ago
reger 1fdcc2d67b change seedfile upload ip check to allow intranet ip in intranet mode
11 years ago
reger e31b0e6d67 - update javadoc Seed.getIP
11 years ago
reger 350c6b8250 in IntranetMode allow intranet hosted seedlist with Network_Domain "any"
11 years ago
orbiter d68438c3d9 make sure that the postprocessing background thread never dies by any
11 years ago
orbiter b4f2a1db6e added a unlock icon for all protected pages that are unlocked because
11 years ago
reger ea6c9e9b07 reduce mem buffer overhead for gap files during r/w
11 years ago
reger e88537522d allow single quote " ' " in query
11 years ago
orbiter 487021fb0a snippet computation update
11 years ago
orbiter 1c2f1f233a Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
reger 5a4995ded3 fill solr rss writer dc:subject tag with keyword content
11 years ago
orbiter 927aaa95a6 concurrency bugfix
11 years ago
orbiter c9e593cf78 removed warnings
11 years ago
reger 7584352e7b use more predefined Solr query parameter constants
11 years ago
reger f9db5dd6c5 reduce doublecontent check document (prevent out of memory)
11 years ago
reger e9eae45b55 simplify rssreader and improve atom feed link extraction
11 years ago
reger a8508417d1 catch NPE during crawl (OAI import)
11 years ago
reger 3dde94422f center searchevent lines on network graph
11 years ago
Michael Peter Christen 3860711aef fix for possible interruption of concurrent queries
11 years ago
Michael Peter Christen 6344718f8b reducing the concurrent query stack size and reduced concurrency of
11 years ago
Michael Peter Christen eca9380e3d bugfix for crawler double-check: if an url is redirected, the
11 years ago
Michael Peter Christen 9ac0c93f17 fix for subpath crawl filter
11 years ago
Michael Peter Christen 66106bdaf0 fix for crawler attribute maxdompages
11 years ago
Michael Peter Christen 49d91b94c3 npe fix in crawler
11 years ago
Michael Peter Christen b7183a7321 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger ea2e627662 fix ConfigAccounts del user with uppercase letter in name
11 years ago
Michael Peter Christen c465b791af typo
11 years ago
Michael Peter Christen 191ec8c82a added concurrency to postprocess rewrite process
11 years ago
Michael Peter Christen a1e8bdd5e9 log ppm instead of docs/second
11 years ago
Michael Peter Christen cc0ded7abd set process type of web graph according to fields as defined in the
11 years ago
Michael Peter Christen 12fb9d7cd1 log postprocessing constraints in case that postprocessing is not
11 years ago
Michael Peter Christen 3c23b89823 less logging
11 years ago
Michael Peter Christen a0c53174c5 better solr query logging to detect unnecessary sort requests for more
11 years ago
Michael Peter Christen 338f574bdc no sorting if http/www unique fields are not demanded (makes query
11 years ago
Michael Peter Christen 1609763be5 toString fix
11 years ago
Michael Peter Christen b983e68254 more retries, less sleep
11 years ago
Michael Peter Christen 1503ba7794 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger 8f77719091 fix "Ljava.lang.String" in crawl queue anchor name
11 years ago
Michael Peter Christen 0ceeceb35e more logic on Solr queries; usage of the query terms in posprocessing,
11 years ago
orbiter 38864ae004 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 4099296b45 added new classes which shall reduce call overhead to Solr (stub)
11 years ago
reger d0c02e1de7 adjust rss lat/lon to double
11 years ago
orbiter 3491ab4c38 removed unused images from webgraph edge computation
11 years ago
orbiter 2371d6b8db target linktexts must be string to enable search facets on these fields
11 years ago
Michael Peter Christen 001e05bb80 do not store failure of loading of robots.txt into the index as a fail
11 years ago
Michael Peter Christen 05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
11 years ago
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks
11 years ago
Marc Nause 9df14fc126 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Marc Nause 477be17c51 Replaced old UPNP library with Weupnp. UPNP should
11 years ago
orbiter 738989aab7 reverted commit f94c91315b because the
11 years ago
orbiter e9163e7e10 fix for malformed hostpath names in crawl balancer
11 years ago
Michael Peter Christen c115f3869c enhanced snippet computation and test method in ViewFile
11 years ago
reger 6c10b59f3e move bootstrap peers test systems to its test class
11 years ago
orbiter 1027f3d04a fix for the usage of ready-prepared solr queries, some queries are
11 years ago
Michael Peter Christen f94c91315b if the webgraph is used, then use it also for reference computation to
11 years ago
Michael Peter Christen 6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
11 years ago
orbiter 4b06adb751 fix for file urls
11 years ago
orbiter 08409ec680 no idea why the words max was an ordered one. This change increaes speed
11 years ago
reger e5854a5cdb fix localhost link to opensearchdescription.xml
11 years ago
Michael Peter Christen b44626e55b fixed target_alt_t in webgraph
11 years ago
Michael Peter Christen 504327b15c fix for condition for writing the webgraph
11 years ago
Michael Peter Christen 542c20a597 changed handling of crawl profile field crawlingIfOlder: this should be
11 years ago
Michael Peter Christen 4eec1a7452 refactoring (change Metadata name of load time data structure to avoid
11 years ago
reger c95ba52cf0 improve logexception info
11 years ago
orbiter e441831a24 reverted toString() change in AnchorURL to prevent mistakenly used
11 years ago
reger 47f201a6b8 Add Solr default query fields (&qf) to select servlet
11 years ago
reger f96cfdc84d prevent array out of bound exception on getRankingProfile(x)
11 years ago
reger 5f5fb4ecdc remove unused static (RSS)search from protocol
11 years ago
reger 7c1706d83a use CRLF in generated bat command scripts for windows
11 years ago
reger a2cb366b25 Combine /heuristic search modifier with opensearch configured targets
11 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
11 years ago
Michael Peter Christen bf1b6b93e7 do not write CR values to webgraph if no CR values are computed
11 years ago
Michael Peter Christen e039e78210 small bugfixes
11 years ago
Michael Peter Christen 32a2ff925c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen d07cdd8c3b added SolrCloud access mode and configuration
11 years ago
Michael Peter Christen 8514bffc22 enhanced postprocessing status report
11 years ago
reger b24572f304 fix GSA filter query assignment
11 years ago
Michael Peter Christen b5fc2b63ea removed exist() retrieval functions from error cache and replaced it
11 years ago
Michael Peter Christen 62c72360ee cleanup of checkAcceptanceInitially in CrawlStacker, should avoid
11 years ago
Michael Peter Christen dd5cdfe212 reverted filter query hack, it did not work
11 years ago
Michael Peter Christen b5d78ba156 reduced number of solr queries during crawling
11 years ago
Michael Peter Christen 5326970d6c enhanced solr queries for single document extraction
11 years ago
Michael Peter Christen 525575bd97 added debugging of filter queries in thread dump thread names
11 years ago
Michael Peter Christen f319ef268f testing filter queries instead of queries to retrieve documents by id
11 years ago
Michael Peter Christen fd87fa1613 removed more unnecessary exist-checks in ErrorCache
11 years ago
Michael Peter Christen f2b476e08b don't do a double check to solr for failed documents if they are not
11 years ago
Michael Peter Christen 06ab72d1af enhanced crawler host round-robin strategy
11 years ago
orbiter dab9a0786a Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 51bf5c85b0 Renamed the transmission cloud to buffer in dispatcher since the name
11 years ago
Michael Peter Christen a694b6a8fc another fix for unique field computation
11 years ago
Michael Peter Christen fb3dd56b02 fix for processing of noindex flag in http header
11 years ago
Michael Peter Christen b0d941626f fixed bugs in canonical, robots and title/description unique calculation
11 years ago
reger d9472d043a cleanup older unused classes
11 years ago
reger 665e12f88e move startup time from old serverCore to switchboard (most used here)
11 years ago
reger 336425912a remove unused localSearchThread from SearchEvent
11 years ago
reger 32bd2a61c1 add local ip to AbstractRemoteHandler local hostname cache
11 years ago
Michael Peter Christen f3a6b6e21e fix for bad URL decoding
11 years ago
Michael Peter Christen 1092e798a5 fixed double content postprocessing
11 years ago
Michael Peter Christen aee5b108e5 added linkScraperParser, a parser which ignores the text like the
11 years ago
reger 2b8cc5832c fix seek error for 0 file size records file
11 years ago
reger 2ba394333f fix Crawler HostQueue release of stackfile
11 years ago
reger 40133ba2d0 fix NPE in Condenser,
11 years ago
orbiter 59160984cc timeline performance update
11 years ago
orbiter 54bea96e67 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
Michael Peter Christen 841cc77391 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen e09218129c remove check for local solr. This check was made during a time when Solr
11 years ago
orbiter 2073e69034 fix for long periods in timeline
11 years ago
reger 1f94df29e7 fix NPE in solr rss where snippet contains only the title text
11 years ago
Michael Peter Christen 09dcdb9b19 update to solr 4.9.0
11 years ago
Michael Peter Christen 1cd4b2e8be Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 8c52f0651b refactoring of AccessTracker events & timeline fix
11 years ago
reger 431a5f9c4e added test case for TextSnippet,
11 years ago
Michael Peter Christen 5b94a257ce no timeout for large reference collections
11 years ago
Michael Peter Christen f5b817bac4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger cb2c17d236 extract author and keywords in .doc and .ppt parser
11 years ago
reger a5707cd2eb enable proper Author navigator
11 years ago
Michael Peter Christen 74206a10c7 refactoring
11 years ago
orbiter fec673c9d1 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 4a66af716d added apkParser stub (work in progress)
11 years ago
orbiter c59da9fe7a added access tracker log reader stub
11 years ago
reger 2d67f29244 adjust mergeDocument after parsing to
11 years ago
Michael Peter Christen 0d29b972cc Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 36e623d8bf enhanced metadata enrichment for media file type search:
11 years ago
Michael Peter Christen 49886fab08 enhanced debugging
11 years ago
Michael Peter Christen b893c42a0f bugfix for image search
11 years ago
Michael Peter Christen c7995d3e2a increased fixed limit for http POST request sizes to 100MB
11 years ago
reger 7847a93558 fix AbstractParser.singleList not adding null strings
11 years ago
Michael Peter Christen 8acae852a0 write <em>-tagged texts also into the bold_txt field
11 years ago
reger 90c4576361 add a link to recrawl index entry to metadata html page
11 years ago
Michael Peter Christen 2626c8f6db using concurrency to do base64 encoding in file POST commands
11 years ago
Michael Peter Christen e132689818 fixed and enhanced Base64 (en)coder (again)
11 years ago
Michael Peter Christen 2415e3db43 enhanced ASCII byte[] -> String conversion
11 years ago
Michael Peter Christen 4751ed974f enhanced base64 encoding
11 years ago
Michael Peter Christen e949071160 removed superfluous date method
11 years ago
Michael Peter Christen 501d55cd35 removed superfluous assert
11 years ago
orbiter 0bbb5040b8 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 9d5d86cd03 Added filter query options to the ranking servlet /RankingSolr_p.html.
11 years ago
Michael Peter Christen d2151857f1 Added collection navigation:
11 years ago
Michael Peter Christen 74c249288a added a push api to make it possible to upload files directly without
11 years ago
Michael Peter Christen f13c8aa7dd re-implementation of file push option in the context of POST http
11 years ago