Commit Graph

7408 Commits (4a35570c903da7058ff3d5eef5265919f3c648ec)

Author SHA1 Message Date
Michael Peter Christen ec6082c872 very bad language detection hack fix hack
10 years ago
Michael Peter Christen 39615de3f9 adding the buffer size is not wrong but may cause confusing information
10 years ago
Michael Peter Christen 395edec6f1 changed strategy to count the number of documents: get the max of
10 years ago
Michael Peter Christen e87dc08c0d set the correct fail time in error docs
10 years ago
Michael Peter Christen cfb20bc0ce removing the [] for ipv6 addresses may be a bad idea..
10 years ago
orbiter b6d57f06eb enhanced the apk parser (up to beeing production-ready).
10 years ago
Michael Peter Christen a7dd89c4de changed method to write the citation index: do not catch up references
10 years ago
Michael Peter Christen 57ce7eeff3 fixed localhost authorization and replaced the adminRealm with an info
10 years ago
orbiter f318d7c285 enhanced date-ordered ranking
10 years ago
reger a6891ff7f8 fix Querygoal.parse exception on +/-null-term
10 years ago
reger c7335318eb remove unused legacy procedure from httpserver
10 years ago
Michael Peter Christen eab0d3e1a9 bugfix for wrong lock display, see
10 years ago
orbiter 49d4f95faf bugfix to latest commit
10 years ago
orbiter 68211f8244 enable Crawler_p servlet if a rss feed or a wiki dump import was
10 years ago
orbiter a65df4ce7e do not push noindex errors into log if in intranet mode. noindex
10 years ago
orbiter 688c6d8954 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
10 years ago
orbiter 4ae7aead28 addon to latest fix
10 years ago
Marc Nause 2af56fa37d Improved UPnP. (still not perfect)
10 years ago
orbiter b3ebd38079 removed the HTDOCS repository concept because the concept to host files
10 years ago
reger 1fdcc2d67b change seedfile upload ip check to allow intranet ip in intranet mode
10 years ago
reger e31b0e6d67 - update javadoc Seed.getIP
10 years ago
reger 350c6b8250 in IntranetMode allow intranet hosted seedlist with Network_Domain "any"
10 years ago
orbiter d68438c3d9 make sure that the postprocessing background thread never dies by any
10 years ago
orbiter b4f2a1db6e added a unlock icon for all protected pages that are unlocked because
10 years ago
reger ea6c9e9b07 reduce mem buffer overhead for gap files during r/w
10 years ago
reger e88537522d allow single quote " ' " in query
10 years ago
orbiter 487021fb0a snippet computation update
10 years ago
orbiter 1c2f1f233a Merge branch 'master' of git@gitorious.org:yacy/rc1.git
10 years ago
reger 5a4995ded3 fill solr rss writer dc:subject tag with keyword content
10 years ago
orbiter 927aaa95a6 concurrency bugfix
10 years ago
orbiter c9e593cf78 removed warnings
10 years ago
reger 7584352e7b use more predefined Solr query parameter constants
10 years ago
reger f9db5dd6c5 reduce doublecontent check document (prevent out of memory)
10 years ago
reger e9eae45b55 simplify rssreader and improve atom feed link extraction
10 years ago
reger a8508417d1 catch NPE during crawl (OAI import)
10 years ago
reger 3dde94422f center searchevent lines on network graph
10 years ago
Michael Peter Christen 3860711aef fix for possible interruption of concurrent queries
10 years ago
Michael Peter Christen 6344718f8b reducing the concurrent query stack size and reduced concurrency of
10 years ago
Michael Peter Christen eca9380e3d bugfix for crawler double-check: if an url is redirected, the
10 years ago
Michael Peter Christen 9ac0c93f17 fix for subpath crawl filter
10 years ago
Michael Peter Christen 66106bdaf0 fix for crawler attribute maxdompages
10 years ago
Michael Peter Christen 49d91b94c3 npe fix in crawler
10 years ago
Michael Peter Christen b7183a7321 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
reger ea2e627662 fix ConfigAccounts del user with uppercase letter in name
10 years ago
Michael Peter Christen c465b791af typo
10 years ago
Michael Peter Christen 191ec8c82a added concurrency to postprocess rewrite process
10 years ago
Michael Peter Christen a1e8bdd5e9 log ppm instead of docs/second
10 years ago
Michael Peter Christen cc0ded7abd set process type of web graph according to fields as defined in the
10 years ago
Michael Peter Christen 12fb9d7cd1 log postprocessing constraints in case that postprocessing is not
10 years ago
Michael Peter Christen 3c23b89823 less logging
10 years ago
Michael Peter Christen a0c53174c5 better solr query logging to detect unnecessary sort requests for more
10 years ago
Michael Peter Christen 338f574bdc no sorting if http/www unique fields are not demanded (makes query
10 years ago
Michael Peter Christen 1609763be5 toString fix
10 years ago
Michael Peter Christen b983e68254 more retries, less sleep
10 years ago
Michael Peter Christen 1503ba7794 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
reger 8f77719091 fix "Ljava.lang.String" in crawl queue anchor name
10 years ago
Michael Peter Christen 0ceeceb35e more logic on Solr queries; usage of the query terms in posprocessing,
10 years ago
orbiter 38864ae004 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
10 years ago
orbiter 4099296b45 added new classes which shall reduce call overhead to Solr (stub)
10 years ago
reger d0c02e1de7 adjust rss lat/lon to double
10 years ago
orbiter 3491ab4c38 removed unused images from webgraph edge computation
10 years ago
orbiter 2371d6b8db target linktexts must be string to enable search facets on these fields
10 years ago
Michael Peter Christen 001e05bb80 do not store failure of loading of robots.txt into the index as a fail
10 years ago
Michael Peter Christen 05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
10 years ago
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks
10 years ago
Marc Nause 9df14fc126 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Marc Nause 477be17c51 Replaced old UPNP library with Weupnp. UPNP should
10 years ago
orbiter 738989aab7 reverted commit f94c91315b because the
10 years ago
orbiter e9163e7e10 fix for malformed hostpath names in crawl balancer
10 years ago
Michael Peter Christen c115f3869c enhanced snippet computation and test method in ViewFile
10 years ago
reger 6c10b59f3e move bootstrap peers test systems to its test class
10 years ago
orbiter 1027f3d04a fix for the usage of ready-prepared solr queries, some queries are
10 years ago
Michael Peter Christen f94c91315b if the webgraph is used, then use it also for reference computation to
10 years ago
Michael Peter Christen 6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
10 years ago
orbiter 4b06adb751 fix for file urls
10 years ago
orbiter 08409ec680 no idea why the words max was an ordered one. This change increaes speed
10 years ago
reger e5854a5cdb fix localhost link to opensearchdescription.xml
10 years ago
Michael Peter Christen b44626e55b fixed target_alt_t in webgraph
10 years ago
Michael Peter Christen 504327b15c fix for condition for writing the webgraph
10 years ago
Michael Peter Christen 542c20a597 changed handling of crawl profile field crawlingIfOlder: this should be
10 years ago
Michael Peter Christen 4eec1a7452 refactoring (change Metadata name of load time data structure to avoid
10 years ago
reger c95ba52cf0 improve logexception info
10 years ago
orbiter e441831a24 reverted toString() change in AnchorURL to prevent mistakenly used
10 years ago
reger 47f201a6b8 Add Solr default query fields (&qf) to select servlet
10 years ago
reger f96cfdc84d prevent array out of bound exception on getRankingProfile(x)
10 years ago
reger 5f5fb4ecdc remove unused static (RSS)search from protocol
10 years ago
reger 7c1706d83a use CRLF in generated bat command scripts for windows
10 years ago
reger a2cb366b25 Combine /heuristic search modifier with opensearch configured targets
10 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
10 years ago
Michael Peter Christen bf1b6b93e7 do not write CR values to webgraph if no CR values are computed
10 years ago
Michael Peter Christen e039e78210 small bugfixes
10 years ago
Michael Peter Christen 32a2ff925c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen d07cdd8c3b added SolrCloud access mode and configuration
10 years ago
Michael Peter Christen 8514bffc22 enhanced postprocessing status report
10 years ago
reger b24572f304 fix GSA filter query assignment
10 years ago
Michael Peter Christen b5fc2b63ea removed exist() retrieval functions from error cache and replaced it
10 years ago
Michael Peter Christen 62c72360ee cleanup of checkAcceptanceInitially in CrawlStacker, should avoid
10 years ago
Michael Peter Christen dd5cdfe212 reverted filter query hack, it did not work
10 years ago
Michael Peter Christen b5d78ba156 reduced number of solr queries during crawling
10 years ago
Michael Peter Christen 5326970d6c enhanced solr queries for single document extraction
10 years ago
Michael Peter Christen 525575bd97 added debugging of filter queries in thread dump thread names
10 years ago
Michael Peter Christen f319ef268f testing filter queries instead of queries to retrieve documents by id
10 years ago
Michael Peter Christen fd87fa1613 removed more unnecessary exist-checks in ErrorCache
10 years ago
Michael Peter Christen f2b476e08b don't do a double check to solr for failed documents if they are not
10 years ago
Michael Peter Christen 06ab72d1af enhanced crawler host round-robin strategy
10 years ago
orbiter dab9a0786a Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 51bf5c85b0 Renamed the transmission cloud to buffer in dispatcher since the name
11 years ago
Michael Peter Christen a694b6a8fc another fix for unique field computation
11 years ago
Michael Peter Christen fb3dd56b02 fix for processing of noindex flag in http header
11 years ago
Michael Peter Christen b0d941626f fixed bugs in canonical, robots and title/description unique calculation
11 years ago
reger d9472d043a cleanup older unused classes
11 years ago
reger 665e12f88e move startup time from old serverCore to switchboard (most used here)
11 years ago
reger 336425912a remove unused localSearchThread from SearchEvent
11 years ago
reger 32bd2a61c1 add local ip to AbstractRemoteHandler local hostname cache
11 years ago
Michael Peter Christen f3a6b6e21e fix for bad URL decoding
11 years ago
Michael Peter Christen 1092e798a5 fixed double content postprocessing
11 years ago
Michael Peter Christen aee5b108e5 added linkScraperParser, a parser which ignores the text like the
11 years ago
reger 2b8cc5832c fix seek error for 0 file size records file
11 years ago
reger 2ba394333f fix Crawler HostQueue release of stackfile
11 years ago
reger 40133ba2d0 fix NPE in Condenser,
11 years ago
orbiter 59160984cc timeline performance update
11 years ago
orbiter 54bea96e67 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
Michael Peter Christen 841cc77391 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen e09218129c remove check for local solr. This check was made during a time when Solr
11 years ago
orbiter 2073e69034 fix for long periods in timeline
11 years ago
reger 1f94df29e7 fix NPE in solr rss where snippet contains only the title text
11 years ago
Michael Peter Christen 09dcdb9b19 update to solr 4.9.0
11 years ago
Michael Peter Christen 1cd4b2e8be Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 8c52f0651b refactoring of AccessTracker events & timeline fix
11 years ago
reger 431a5f9c4e added test case for TextSnippet,
11 years ago
Michael Peter Christen 5b94a257ce no timeout for large reference collections
11 years ago
Michael Peter Christen f5b817bac4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger cb2c17d236 extract author and keywords in .doc and .ppt parser
11 years ago
reger a5707cd2eb enable proper Author navigator
11 years ago
Michael Peter Christen 74206a10c7 refactoring
11 years ago
orbiter fec673c9d1 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 4a66af716d added apkParser stub (work in progress)
11 years ago
orbiter c59da9fe7a added access tracker log reader stub
11 years ago
reger 2d67f29244 adjust mergeDocument after parsing to
11 years ago
Michael Peter Christen 0d29b972cc Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 36e623d8bf enhanced metadata enrichment for media file type search:
11 years ago
Michael Peter Christen 49886fab08 enhanced debugging
11 years ago
Michael Peter Christen b893c42a0f bugfix for image search
11 years ago
Michael Peter Christen c7995d3e2a increased fixed limit for http POST request sizes to 100MB
11 years ago
reger 7847a93558 fix AbstractParser.singleList not adding null strings
11 years ago
Michael Peter Christen 8acae852a0 write <em>-tagged texts also into the bold_txt field
11 years ago
reger 90c4576361 add a link to recrawl index entry to metadata html page
11 years ago
Michael Peter Christen 2626c8f6db using concurrency to do base64 encoding in file POST commands
11 years ago
Michael Peter Christen e132689818 fixed and enhanced Base64 (en)coder (again)
11 years ago