Commit Graph

8335 Commits (86534a56f7c23c05be032a80e4a26b5b459017a3)

Author SHA1 Message Date
luccioman 7d5ba2afa4 Added some JavaDoc and moved crawlStacker close at the right place.
8 years ago
luccioman 8edbcd8ad4 Log eventual Solr instances close errors.
8 years ago
reger 330768c8a2 fix for solr write.lock after mode change http://mantis.tokeek.de/view.php?id=686
8 years ago
reger 585d2a6441 test case: for NewsPool to check the id modificator (for unique id)
8 years ago
luccioman de5c873e38 Removed unused JavaScript file docs.min.js
8 years ago
Michael Peter Christen df51e4ef07 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen e063aaf97f enable fuzzy search, solr style (append a ~ to get a fuzzyness on the
9 years ago
reger ff6589fc0f test case: simulating multi word query for local rwi index
9 years ago
reger e990297d2e avoid NPE on hello message with missing "yourip" key
9 years ago
reger e51ab8c7aa hack to generate a unique message-id for messages created in the same second
9 years ago
Michael Peter Christen b82300358a removed version number check because it does not work any more if
9 years ago
Michael Peter Christen 2107674999 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 0d28f563f4 fix for java version "9-ea"
9 years ago
reger 3b694b3935 add some javadoc to rwi wordreference distance, position
9 years ago
reger a4465c97d6 as requested, disable/remove old swf parser
9 years ago
reger 7f63fc50f3 prepare a IndexSegment test case for RWI index testing
9 years ago
reger 96467c5467 remove not needed counter in Tokeninzer (completing last changes)
9 years ago
luccioman d66b0f7b7b Fixed french messages encoding in YaCy tray.
9 years ago
reger 7efb66ee10 adjust the WordReference.join wordsintext calc to take the max (instead of sum)
9 years ago
luccioman 0a9ff14d96 Fixed NullPointerException case and added Javadoc
9 years ago
luccioman 06d4f93d03 Merged master into postprocessing branch
9 years ago
Michael Peter Christen b73d2db914 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 25a3c7a6d0 catch exception and write end of object
9 years ago
reger 272cdd496a reactivate sentence counter in WordTokenizer for phrasepos ranking,
9 years ago
Michael Peter Christen 5e165a8150 removed unused imports
9 years ago
Michael Peter Christen c716648c78 enhanced json encoding of strings
9 years ago
Michael Peter Christen 6139bd85a8 fix for broken facet names
9 years ago
Michael Peter Christen 5060f9fee9 fix for too long snippets
9 years ago
Michael Peter Christen 8681cee3f3 fix for bad comma
9 years ago
Michael Peter Christen db6d8fc197 fix for bad json
9 years ago
Michael Peter Christen 8f4a341735 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 9934f546bb added default fl to solr query, removed large texts retrieval and
9 years ago
reger 120bf7e6e2 implemented RWI WordReference to return the word position value (was always left empty)
9 years ago
reger e310ec5f70 fix posInText ranking calculation to score 0 on no position info
9 years ago
luccioman 74f9927ddc Merge remote-tracking branch 'origin/master' into dist_macOS
9 years ago
reger 51c077f493 adjust the getTopics() and getTopicNavigator() to current useage
9 years ago
reger 39dd244693 fix ConcurrentScoreMap.set() calculation of totalCount()
9 years ago
reger ebf818ad95 log a error on aborted news publish (due to duplicate news.id)
9 years ago
reger cc2d9dd3f1 reactivate the use of included-in-topwords boost in postRanking
9 years ago
luccioman 39ea28adfd Merged master to dist_macOS branch.
9 years ago
luccioman 8255e91c99 Fixed serverClassLoader.findClass method
9 years ago
reger 6801673a07 apply postranking media search boost only on media queries
9 years ago
luccioman 1dc4306058 Fixed indentation for better readability.
9 years ago
luccioman 8c49a755da Postprocessing refactoring
9 years ago
luccioman 42f45760ed Refactored postprocessing
9 years ago
reger 4386e84b55 correct NewPool rentention calculation
9 years ago
reger 5e72d37f0a TransNews_p: add ad-hoc translation of target file on positive vote (additon to local translation)
9 years ago
reger 9462a32244 Added news service for easy, community driven UI translation support.
9 years ago
reger f8d6543a23 Rename class CreateTranslationMaster to TranslationManager and add
9 years ago
reger 19b4509d54 speed-up reading of xlif language file, by using xmlparser (stax) instead of jaxb
9 years ago
Michael Peter Christen e1fac86f53 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen a9316ceff6 force browser-caching of favicons from search results
9 years ago
Orbiter 503312ca43 Merge pull request #61 from luccioman/heroku_experiments
9 years ago
reger 33bf35d90f missing file for prev commint "Introduction of additional language setting browser"
9 years ago
reger 16e8ed3f01 Introduce additional language setting "browser/Browser Language" for UI internationalization.
9 years ago
reger 3b47a07dd1 change unused servletProperties entry CONNECTION_PROP_CLIENT_REQUEST_HEADER to
9 years ago
reger 036c1dc6ef fix CookieTest_p formatting (output of <br> as text),
9 years ago
Michael Peter Christen bf6709d196 fixed missing browser activation in linux
9 years ago
Michael Peter Christen d8504418b6 enhanced browser-caching of static content
9 years ago
Michael Peter Christen 079112358c Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen efeb592661 don't do solr optimization, this create high IO load. We should leave
9 years ago
luccioman 46b8836548 Copy image resources contained in donation iframe.
9 years ago
reger 4c7a77662a eleminate dependency on file-extension in storeDocument but use supported mime-type
9 years ago
reger ebde21079a refactor xlsParser to include Excel file attribute (like author) in parser result doc.
9 years ago
luccioman 744c9a2615 Opensearch desc : handle https protocol url with default port (443)
9 years ago
luccioman b9c28893ee Merged master to 'heroku' branch.
9 years ago
Michael Peter Christen 103a8348b3 fix for NPE and small performance enhancement
9 years ago
reger 2910fe35c1 add missing scheduler calc of next exec_date (call of calculateAPIScheduler)
9 years ago
reger 70d47ae38a keep scheduler selection by repeat entry from 07311020d4
9 years ago
reger 7c3f932e5d revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)
9 years ago
reger 07311020d4 postpone apicall exec date init until actual call
9 years ago
reger 5e335b32da fix Blacklist.contains() matching path pattern to string
9 years ago
reger 5e9e871192 fix Blacklist.remove by using pattern.toString to find pattern to remove,
9 years ago
reger 1843ea7e69 on Blacklist.add pattern to source file also update internal entry maps
9 years ago
reger bf6ce33da3 Correct use of _htDocsPath config in YaCyDefaultServlet to use servlet config variable
9 years ago
luccioman 480027ec98 Merge remote-tracking branch 'origin/master' into heroku_experiments
9 years ago
reger fcad2d0744 add uses of config constant INDEX_RECEIVE_ALLOW
9 years ago
reger 226f81cfcf declare poison pill url MultiProtocolURL() as protected to make sure not
9 years ago
reger f8632ad292 prevent string index out of bounds MultiProtocolURL.getPaths
9 years ago
reger 35a7d57260 update lucenematchversion to current (5.2.0 -> 5.5.0)
9 years ago
reger 9b07bbf955 deprecate newurl(), not used and already replaced
9 years ago
luccioman 47d486298f Merged changes from master.
9 years ago
reger 774b3906a9 fix GenericFormatter.parse ("time","timeoffset")
9 years ago
reger 27163af0e1 improve detection of referenced links by taking http and https link protocol
9 years ago
reger f89d4eb51d fix MultiProtocolURL init (assign of host) for urls with '/' in query part
9 years ago
reger 87fcfc6d78 Adjusted hash computation and toNormalform for file:// protocol to deliver
9 years ago
luccioman d6bf90803f Merged from maain master branch.
9 years ago
luccioman 9b9c112263 Handle more propertly local port configuration by system property
9 years ago
reger 3811184abd fix GSA servlet clientIP retrival
9 years ago
reger 7ab41d4ff1 use directories original lastmodified date in file- & smbloader in response
9 years ago
reger 708bcbb042 one more replacement to use cached hosthash vs. calculated
9 years ago
luccioman b57a06d88e Let Heroku decide which http port to use
9 years ago
reger 22db449f2a to prevent crawler to concurrently access and alter same crawl queue
9 years ago
luccioman 893a40995a Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Orbiter 50c5ddf1a1 Merge pull request #56 from luccioman/LibreJS
9 years ago
Michael Peter Christen 7466d390b2 small refactoring + do not accept too old peers during bootstrap
9 years ago
luccioman 6e96c7341a Merge remote-tracking branch 'origin/master'
9 years ago
reger 8d58a48029 remove wrong log line in CrawlSwitchboard
9 years ago
reger 5aaa057c65 ignore empty input lines in FileUtils.getListArray() to poka joke blacklist read.
9 years ago
reger 41c36ffd75 exclude rejected results from result count
9 years ago
reger d4da4805a8 internal wiki code, require header line to start with markup
9 years ago
reger e952e355a2 have Translator servlet adhoc apply added translation by translating a single file
9 years ago
reger b119ff65be clean out not used Switchboard variables
9 years ago
reger 223071337b Translator to take caution of word boundaries to identify text portion to
9 years ago
luccioman 009657791e Merge remote-tracking branch 'origin/master' into LibreJS
9 years ago
luccioman a73c9327a5 JavaScript License fixes for LibreJS compatibility
9 years ago
reger 0c40401d28 fix MessageBoard test for null data
9 years ago
reger 5b22c63030 Adjust TranslatorXliff to load default 1st and merge downloaded or modified local translation.
9 years ago
reger a2e0f00456 optimize Translator
9 years ago
reger a6ba1faa80 introduce a translation edit servlet Translator_p.html YaCy's UI text translation
9 years ago
reger b3c9041f79 remove with localHostNames redundant (but unused) publicIPv4HostNames and publicIPv6HostNames
9 years ago
reger bd8f7c11f5 Use transparent addToCrawler in AutoSearch instead of addToIndex
9 years ago
reger f23d8ab47b fix 2 more servlet RuntimeException in intranet mode thrown due to seed.getIP()
9 years ago
reger bb0076c3dd fix: assure close inputstream in TranslatorXliff after reading xlf file
9 years ago
reger 6384b7d82e fix NPE in Load_MediawikiWiki servlet in intranet mode
9 years ago
Michael Peter Christen 596b5dfa59 add the JRE version in the seed. Purpose: identify if it is possible to
9 years ago
reger 4cc38e979d add InputStream close after reading input file (Vocabulary_p servlet)
9 years ago
reger 6bf9c55584 adjust Solr select servlet to lates bugfix for boostquery (bq param)
9 years ago
Burkhard 9a18e2297b Merge pull request #51 from JeremyRand/multiple-boost-query
9 years ago
reger f0d7b93372 make use and activate autodetect charset in Vocabulary input from file
9 years ago
JeremyRand 433217b33e Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)
9 years ago
JeremyRand 58824dfa6c Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx.
9 years ago
reger 9e94989237 upd to PDFBox 2.0.1
9 years ago
reger d0a571bed2 del cytag trail for own index.html (save resource not used by default)
9 years ago
reger de46879637 fix SeedDB.get(byte[]) hash string compare (for returning own seed shortcut)
9 years ago
reger 24b0fa2a38 extend snapshot Html2Image.pdf2image to use PDFBox image export capability
9 years ago
reger eb2a00b1d8 fix NPE on missing crawldepth_i
9 years ago
reger efb9f1a8b7 save resource for unused blacklistFiles map
9 years ago
reger 5f113be760 cleanup connectPeer & yacyVersion.latestRelease usage
9 years ago
reger 7097dcbdbd cleanup hack for partial Solr update on multivalued datefields
9 years ago
reger f10ea3c155 clean-out unused SwitchboardConstants
9 years ago
reger ef24593347 delete obsolete SEARCHRESULT busythread constants
9 years ago
reger 125b5e26a5 apply bugfix for ChartPlotter from Pullreq 42
9 years ago
reger 06ce9ae711 prevent "unchecked conversion" compiler message
9 years ago
reger b4a576dbdf exclude unused protocol param "duetime"
9 years ago
reger 3bd6ae8d8b keep addon/Notepad++ keyword marker on lng export
9 years ago
reger 16837d60c7 fix version in locale version file
9 years ago
reger 0fb01e429e fix migration, account for ssl port in config (for auto-disable https)
9 years ago
reger 7be1c7a05a fix logger name
9 years ago
reger 1d940e5a94 upd commons-compress 1.11
9 years ago
reger 7789c32c82 delete crawl queue on init exception
9 years ago
reger f781b9dd47 revert call condition f. migration.installSkins
9 years ago
reger 3adb670f44 remove never used Domains.myHostNames set
9 years ago
reger 6ecc180299 fix rwi doubledom return best (highest) ranking
9 years ago
reger 2343e3f1cd keep and update existing xlf translation master instead of create new
9 years ago
reger a1935f485f Added utility class CreateTranslationMasters to create a language independant
9 years ago
reger acaf51b296 keep ConfigLanguage_p as 1st entry in exported translation file
9 years ago
reger 61c5b6b403 fix empty drop down list in ConfigLanguage after wrong/empty download
9 years ago
reger 4eddabee42 translate Network History screen -> de
9 years ago
reger 90c79014ae remove unused translator routine which also doesn't handle rel path input
9 years ago
reger 902e79e261 Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map.
9 years ago
reger d9adc2c255 load handler for Transparent Proxy on startup only if feature is activated
9 years ago
reger ec24a0c85a add test case for optimized toTokens()
9 years ago
reger cada24f918 adjust utility ListNonTranslatedFiles for path compare on windows
9 years ago
reger fb8ae14b21 make migration version safe
9 years ago
reger 258cd41577 reduce logging (EmbeddedSolrConnector.query)
9 years ago
reger 6783ef5540 move example code SearchClient out of yacycore package
9 years ago
Michael Peter Christen b89465d952 0N - basic dump upload servlet infrastructure, to share index dumps
9 years ago
Michael Peter Christen f12a900f3e harmonization of http post of files for one and several files - this had
9 years ago
Michael Peter Christen 849ab671a9 0n: modified the p2p bootstraping process - rules had been too tight and
9 years ago
reger 764f5100f0 fix delete of temp file after odt % ooxml parser
9 years ago
reger 379e9b330d use supplied url port to get robots.txt in crawlers hostqueue
9 years ago
reger 58a959403d fix mixed logfactory in UrlProxyServlet,
9 years ago
Michael Peter Christen 2494a820c7 0N - added recording of dump exports if given time frame is not negative
9 years ago
Michael Peter Christen ef2cc4f690 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen a6bf0b1649 0N - added option to generate index export files for a specific number
9 years ago
reger 6d56beaed8 fix assertion exception in toString of MultiProtocolURL
9 years ago
reger 42a7bdb2af fix SolrSelectServlet authentication to default to true
9 years ago
reger dbb28bb4f3 del unused statistic parameter (from status servlet)
9 years ago
reger 06d0e2aeb9 result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
9 years ago
reger caf9e98f09 put metadata dc_publisher in corresponding schema field
9 years ago
reger 38e2b054d4 remove servlet classloder internal cache map (to save the resources, cache hits marginal)
9 years ago
luc 3f338777f7 Also check and index eventual icon url information from metadata.
9 years ago
luc 9f712146df Display icons in ViewFile "links" mode.
9 years ago
luc 26f1ead57c Created ViewFavicon class specialized in favicon viewing.
9 years ago
reger 6f0b073bf3 override detected language (statistic langdetect) only with TLD determided
9 years ago
reger b65e2b527d include use of condenser's content text for language detection.
9 years ago
luc 07222b3e1a Added favicon url transmission in RWI chunks.
9 years ago
luc 480772c070 Fixed json search results from commit "Improved URLLicence reliability"
9 years ago
reger 937fbb0b9f correct isHidden() for smb from last commit
9 years ago
reger 535d4bf75f respect hidden attribute for file and smb directory listing
9 years ago
luc 3cc5619d93 Improved HTML icons indexing and rendering in search results.
9 years ago
luc edef6cd0dc Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger c28142095a add findClass() to servlet class loader (used in YaCyDefaltServlet)
9 years ago
luc f7b854465b Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger a6617ad887 expand initRemoteCrawler() to terminate worker threads if called to deactivate
9 years ago
reger 2048b7e057 support scraping start-/enddate from html tag with property "datetime"
9 years ago
reger 900d4584ba complet resource cleanup of lists in contentscraper's close()
9 years ago
luc aa60ad1dbc Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 1f18653de0 pass parsed swf content trough htmlscraper
9 years ago
reger 18ecf57792 add support of compressed swf to swfParser
9 years ago
sixcooler 5cb7ba0dc4 fix for connections not getting closed to get favicon.ico during seach
9 years ago
luc ef83e34b8a Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger ed3e16e092 apply remote result count config value to Bookmark Autosearch
9 years ago
Ryszard Goń a98c395023 Add the Autocrawl thread
9 years ago
Ryszard Goń 1728cd30c6 Create autocrawl profiles
9 years ago
luc 41767a01c2 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger ff27824964 fix swfParser reading file signature
9 years ago
luc 7aa1a29e33 Return more accurate HTTP status 400 with detail message when some error
9 years ago
luc bd9dc2f32b Corrected NullPointerException cases occuring in YJsonResponseWriter
9 years ago
luc 0076f9f97d Updated documented sample url
9 years ago
luc cfdbc2b487 Improved URLLicence reliability for use by conccurrent non authaurized
9 years ago
reger c91e712178 further refactor using standard java / (one) utf-8 charset variable
9 years ago
luc 571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
9 years ago
reger 1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields
9 years ago
sixcooler 5a35f9383a bump to solr/lucene 5.4.0
9 years ago
reger a58d34a4e8 check error URL cache before adding errorDoc to index
9 years ago
reger e9539b1086 reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart
9 years ago
reger cd26717ba2 fix low memory status hint (dht-in disabled)
9 years ago
reger a5faf73afa remove obsolete yacy.init entries interaction.*
9 years ago
sixcooler dce1cb65c4 Merge remote-tracking branch 'choose_remote_name/master'
9 years ago
reger 46ac0867ff fix poison mediawikiimporter output queue also after ExecutionException
9 years ago
reger a7591d3ed0 fix mediawikiimporter number format exception on coordinate parsing
9 years ago
reger 9da1712a31 increase http header EXPIRES for css and images in DefaultServlet
9 years ago
reger 6d54eb3d36 skip loading document on crawl start for YMark bookmarks
9 years ago
reger 80e2c82249 fix NPE on empty blog importfile parameter
9 years ago
reger e84d94f8ca fix mime table for ms office / open office documents
9 years ago
reger 45b9bd8403 adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
9 years ago
reger d5fd031449 fix reading of ippattern config array in URLProxy
9 years ago
reger b7e8358645 make use of header.getContentType where possible (mime is normalized afterwards)
9 years ago
reger 7a8c077838 fix HeaderFramework.mime() to strip charset parameter.
9 years ago
reger b4b6910d60 fix (todo): correct doc.id of remote search result if no match with newly
9 years ago
reger dec3e6ad96 fix: adjust urlstub for mailto links
9 years ago
reger cb83e65f89 drop returning document language "en" if unknown (fix todo)
9 years ago
reger 0c5548a7ff fix (todo) remove redundant holding of email link nameproperty in parser document
9 years ago
reger 71c416f383 show mailto links in ViewFile.html linklist
9 years ago
reger 6b7c10cef8 fix dc:date in mediawikiimporter/document.writexml to use lastmodified
9 years ago
reger 14803d58cd let html scraper accept html5 <link rel="icon"> for favicon links
9 years ago
luc b4cdacee76 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc ba0a293f5c Corrected another case of
9 years ago
reger 4d2b934487 prevent mailto links getting into parser result document's in/outbound link collection
9 years ago
luc 8c4ab9c76b Added an option to eventually limit size of remote solr documents put to
9 years ago
luc a2c08402af Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 70595d05d0 Modified MemoryControl.main() test to properly end for better results
9 years ago
sixcooler 1be67d9ab6 CachedSolrConnector was replaced by ConcurrentUpdateSolrConnector years
9 years ago
reger 28b8bc290a fix use of NETWORK_SEARCHVERIFY for rwi verification
9 years ago
reger 020630efd8 remove unused network scanner parameter from queryparameter
9 years ago
luc ad5586f8f6 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 8ebefa4233 Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was
9 years ago
luc 7736ee5a42 Updated MediaWimporter main() : display usage in console and stop
9 years ago
reger cdb8f3b10d make current ranking score value avail. to search interface / api
9 years ago
luc 27d11f8671 Fixed isSolrDump function : PushBackInputStream was not unread when
9 years ago
Michael Peter Christen 135a123a77 less logging in new language detection
9 years ago
Michael Peter Christen ef8cd80593 fix for npe
9 years ago
reger 0347bfa71f Apply collection query constraint/modifiert to rwi result stack.
9 years ago
luc 2a67d2ba6f Corrected error management for unsupported image formats, parsing
9 years ago
Michael Peter Christen d6e9834040 Merge branch 'master' of
9 years ago
Michael Peter Christen d82d311995 Merge branch 'master' of https://github.com/luccioman/yacy_search_server
9 years ago
reger b5371ea8c1 read/init crawl queue in a thread
9 years ago
reger 1160b13172 remove unused md5 from ViewFile servlet params
9 years ago