reger
5f113be760
cleanup connectPeer & yacyVersion.latestRelease usage
...
obsolete since
527b3decde
9 years ago
reger
7097dcbdbd
cleanup hack for partial Solr update on multivalued datefields
...
has been fixed in Solr http://issues.apache.org/jira/browse/SOLR-8050
9 years ago
reger
f10ea3c155
clean-out unused SwitchboardConstants
9 years ago
reger
ef24593347
delete obsolete SEARCHRESULT busythread constants
...
not used since 29.05.2013 18:27:27
0c1a018bbd
9 years ago
reger
125b5e26a5
apply bugfix for ChartPlotter from Pullreq 42
...
https://github.com/yacy/yacy_search_server/pull/42
thanks to otteresk (https://github.com/otteresk )
9 years ago
reger
06ce9ae711
prevent "unchecked conversion" compiler message
...
+ include "translate" property in xlf "trans-unit" export
9 years ago
reger
b4a576dbdf
exclude unused protocol param "duetime"
...
(receiver interpretes param "time" only)
9 years ago
reger
3bd6ae8d8b
keep addon/Notepad++ keyword marker on lng export
...
(length of remarks devider line)
+ harmonize status_p.inc lng text
9 years ago
reger
16837d60c7
fix version in locale version file
...
(it's compared to full version)
9 years ago
reger
0fb01e429e
fix migration, account for ssl port in config (for auto-disable https)
9 years ago
reger
7be1c7a05a
fix logger name
9 years ago
reger
1d940e5a94
upd commons-compress 1.11
9 years ago
reger
7789c32c82
delete crawl queue on init exception
...
(happens occasionally on path name vaiolation and will never get resolved)
9 years ago
reger
f781b9dd47
revert call condition f. migration.installSkins
...
(a bug introduced in fb8ae14b21
,
see comment on that commit )
9 years ago
reger
3adb670f44
remove never used Domains.myHostNames set
9 years ago
reger
6ecc180299
fix rwi doubledom return best (highest) ranking
9 years ago
reger
2343e3f1cd
keep and update existing xlf translation master instead of create new
...
in utility CreateTranslationMasters
+ small fixes in lng's
9 years ago
reger
a1935f485f
Added utility class CreateTranslationMasters to create a language independant
...
translation master as source to harmonize individual translation files
Included a main to create masters in YaCy an xliff format for testing
+ restrict TranslatorXliff to use only entries with State=translated
P.S. used https://open-language-tools.java.net/editor/about-xliff-editor.html to
experiement with xlf output (haven't a Pootle avail.)
9 years ago
reger
acaf51b296
keep ConfigLanguage_p as 1st entry in exported translation file
...
+ rem untranslated text & some typo fixes in several translations
(considering to create a translation master file to harmonize entries)
9 years ago
reger
61c5b6b403
fix empty drop down list in ConfigLanguage after wrong/empty download
...
+ add xliff translated attribut
+ append japanese lng name
9 years ago
reger
4eddabee42
translate Network History screen -> de
...
+ remove leftover debug line
9 years ago
reger
90c79014ae
remove unused translator routine which also doesn't handle rel path input
...
+ correct some language file match issues
9 years ago
reger
902e79e261
Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map.
...
This eases up suggested initatives from http://mantis.tokeek.de/view.php?id=649
Allows longer term also to store translation maps for the htroot files
in standardized/reuseable xliff format ( http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html ).
+ added test case creating and comparing xliff file with internal custom prop file.
(currently the introduced class is not used in core code)
9 years ago
reger
d9adc2c255
load handler for Transparent Proxy on startup only if feature is activated
...
to save the resources and keep handler chain small if the feature is not used.
+add a warning message on settingsack_p page to restart on first activation
9 years ago
reger
ec24a0c85a
add test case for optimized toTokens()
9 years ago
reger
cada24f918
adjust utility ListNonTranslatedFiles for path compare on windows
...
(backslash replace)
9 years ago
reger
fb8ae14b21
make migration version safe
9 years ago
reger
258cd41577
reduce logging (EmbeddedSolrConnector.query)
...
mainly to reduce the frequent metadat checks like
> EmbeddedSolrConnector.query QUERY: q={!cache=false raw f=id}xXxXxX&rows=1&start=0&fl=id,load_date_dt
(p.s. direct servlet queries logged via AccessTracker.addToDump)
9 years ago
reger
6783ef5540
move example code SearchClient out of yacycore package
...
to example directory
9 years ago
Michael Peter Christen
b89465d952
0N - basic dump upload servlet infrastructure, to share index dumps
...
within an experimental new sharing model
9 years ago
Michael Peter Christen
f12a900f3e
harmonization of http post of files for one and several files - this had
...
been differently - and wrong for several files. also: base64-encoding
for gzipped push files because our data structures currently only
supports ASCII POST pushes..
9 years ago
Michael Peter Christen
849ab671a9
0n: modified the p2p bootstraping process - rules had been too tight and
...
did not support the re-start of a network with just one principal peer.
9 years ago
reger
764f5100f0
fix delete of temp file after odt % ooxml parser
...
Close zipfile after parsing
9 years ago
reger
379e9b330d
use supplied url port to get robots.txt in crawlers hostqueue
9 years ago
reger
58a959403d
fix mixed logfactory in UrlProxyServlet,
...
Class doesn't use functions of declared ancestor, change to extend on httpservlet
9 years ago
Michael Peter Christen
2494a820c7
0N - added recording of dump exports if given time frame is not negative
9 years ago
Michael Peter Christen
ef2cc4f690
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen
a6bf0b1649
0N - added option to generate index export files for a specific number
...
of minutes in the past and reverted latest change. The export file dump
will now contain four data elements: f - first date of index entry write
date, l - last date of index write date, n - now-date of index dump
time, c - count of numbers inside the dump. '0N' denotes a series of
changes which will lead to the opportunity to exchange index data dumps
in a way that is needed to integrate ZeroNet index data. This will be
based on index dump sharing; that causes this commit.
9 years ago
reger
6d56beaed8
fix assertion exception in toString of MultiProtocolURL
...
toString of AnchorURL and MultiProtocolURL are identical code
(no need to override or to protect call to parent)
as reported in https://github.com/yacy/yacy_search_server/issues/43
9 years ago
reger
42a7bdb2af
fix SolrSelectServlet authentication to default to true
9 years ago
reger
dbb28bb4f3
del unused statistic parameter (from status servlet)
9 years ago
reger
06d0e2aeb9
result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
...
- Above brought up that parser start url parameter, declared as AnchorURL uses only methodes of parent object DigestURL (changed parameter declaration accordingly).
9 years ago
reger
caf9e98f09
put metadata dc_publisher in corresponding schema field
9 years ago
reger
38e2b054d4
remove servlet classloder internal cache map (to save the resources, cache hits marginal)
...
- DefaultServlet includes already a class cache "templateMethodCache" which is emptied
on low mem status
- avoid classloader cache gets has no hits but over time holds all (used) servlet classes
9 years ago
reger
6f0b073bf3
override detected language (statistic langdetect) only with TLD determided
...
language if langdetect probability is not high.
+ additionally truncate zh-cn / zh-tw returned by langdetect to 2 char ISO639-1 zh
used by YaCy
9 years ago
reger
b65e2b527d
include use of condenser's content text for language detection.
...
Language identification may show poor performance on documents with short or no
title but clear lang indication in text content. Using content text too
improves lang detection.
+ remove double caching of text in Identificator
9 years ago
reger
937fbb0b9f
correct isHidden() for smb from last commit
9 years ago
reger
535d4bf75f
respect hidden attribute for file and smb directory listing
...
(hidden directories are not listed, effects crawling of local file system)
9 years ago
reger
c28142095a
add findClass() to servlet class loader (used in YaCyDefaltServlet)
...
In the 2 cases where servlet calls servlet the jvm classloader chain is
invoked and servlet class loaded by jvm loader (successful while requiring
htroot in system classpath). This patch uses the standard override design
for loaders to handle these cases (making in not longer crucial to have htroot
in system classpath, as this classLoader is mainly used for servlets and
looks in this case for the class in the configured path).
+ As the default classloader is parallelcapable we should register this too.
9 years ago
reger
a6617ad887
expand initRemoteCrawler() to terminate worker threads if called to deactivate
...
remote crawl.
On startup we save the resources for remote crawler if disabled. Once started
threads are running idle after disable remote crawl. Now threads are terminated
to save the resources also while disabeling during runtime.
+ remove empty class Channels
9 years ago
reger
2048b7e057
support scraping start-/enddate from html tag with property "datetime"
...
This may be used in html5 <time> tag (which we don't explicite support yet for date in content scraping).
9 years ago
reger
900d4584ba
complet resource cleanup of lists in contentscraper's close()
9 years ago
reger
1f18653de0
pass parsed swf content trough htmlscraper
...
Swf may contain subset of html tags which shoul'd appear as text.
Especially <font> tag may totally screw up metadata servlet if not filtered out.
9 years ago
reger
18ecf57792
add support of compressed swf to swfParser
...
from JavaSWF2 (source compatible to WebCat).
Moved swf file signature check to parser
Changed use of synced vector to list swf InStream
9 years ago
sixcooler
5cb7ba0dc4
fix for connections not getting closed to get favicon.ico during seach
9 years ago
reger
ed3e16e092
apply remote result count config value to Bookmark Autosearch
...
+ prepare to make the widely unused Bookmark feature optional
9 years ago
Ryszard Goń
a98c395023
Add the Autocrawl thread
9 years ago
Ryszard Goń
1728cd30c6
Create autocrawl profiles
9 years ago
reger
ff27824964
fix swfParser reading file signature
...
before passing to library (current version expects data w/o signature)
9 years ago
reger
c91e712178
further refactor using standard java / (one) utf-8 charset variable
...
extending initiative of commit 9a25751850
9 years ago
luc
571bc55937
Refactoring : use StandardCharsets constants instead of hard-coded
...
charset names.
9 years ago
reger
1af0e9ef74
remove workaround for Solr bug regarding multivalued date fields
...
fixed in 5.4.0
http://issues.apache.org/jira/browse/SOLR-8050
9 years ago
sixcooler
5a35f9383a
bump to solr/lucene 5.4.0
9 years ago
reger
a58d34a4e8
check error URL cache before adding errorDoc to index
...
- del obsolete related switchboardconstant
9 years ago
reger
e9539b1086
reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart
...
- add filename to parameter fieldname
- add filecontent to special parameter fieldname$file
(some servlets use this $file parameter)
fix for http://mantis.tokeek.de/view.php?id=542
9 years ago
reger
cd26717ba2
fix low memory status hint (dht-in disabled)
...
http://mantis.tokeek.de/view.php?id=619
9 years ago
reger
a5faf73afa
remove obsolete yacy.init entries interaction.*
...
(related to removed triplestore)
9 years ago
sixcooler
dce1cb65c4
Merge remote-tracking branch 'choose_remote_name/master'
9 years ago
reger
46ac0867ff
fix poison mediawikiimporter output queue also after ExecutionException
...
in worker thread.
Writer of importer keeps needs a poison to close the file. On exception (e.g. OOM)
add a poison marker in outer most try/catch to assure output queue will terminate
in this condition too (and closes+renames the surrogate/in/xxx.prt file)
9 years ago
reger
a7591d3ed0
fix mediawikiimporter number format exception on coordinate parsing
...
handle uncomplete metadata like "NS=43/50//N".
For other {expr ... } type entries a try catch added
9 years ago
reger
9da1712a31
increase http header EXPIRES for css and images in DefaultServlet
...
to increase browser cache hits for not changing content
9 years ago
reger
6d54eb3d36
skip loading document on crawl start for YMark bookmarks
...
by adding a constructor giving the already loaded document as parameter.
9 years ago
reger
80e2c82249
fix NPE on empty blog importfile parameter
9 years ago
reger
e84d94f8ca
fix mime table for ms office / open office documents
...
(causing wrong parser detect in intranet mode)
9 years ago
reger
45b9bd8403
adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
...
and feeding hyperlinks to webgraph processing.
9 years ago
reger
d5fd031449
fix reading of ippattern config array in URLProxy
9 years ago
reger
b7e8358645
make use of header.getContentType where possible (mime is normalized afterwards)
...
otherwise use header.mime() differentiated in prev. commit.
9 years ago
reger
7a8c077838
fix HeaderFramework.mime() to strip charset parameter.
...
Differentiate mime() and getContentType() which gives the raw header field.
This improves parser detection if charsets are included in http content-type field.
9 years ago
reger
b4b6910d60
fix (todo): correct doc.id of remote search result if no match with newly
...
calculated doc hash if different.
Testing showed that in some cases delivered url doesn't match the local
calculated hash. In this case replace doc.id (and host_id_s) with calculation
from url.
9 years ago
reger
dec3e6ad96
fix: adjust urlstub for mailto links
...
(skip protocol)
9 years ago
reger
cb83e65f89
drop returning document language "en" if unknown (fix todo)
...
which also harmonizes handling of query.modifier for rwi and solr results
(to result must match a given language filter)
9 years ago
reger
0c5548a7ff
fix (todo) remove redundant holding of email link nameproperty in parser document
9 years ago
reger
71c416f383
show mailto links in ViewFile.html linklist
9 years ago
reger
6b7c10cef8
fix dc:date in mediawikiimporter/document.writexml to use lastmodified
9 years ago
reger
14803d58cd
let html scraper accept html5 <link rel="icon"> for favicon links
9 years ago
luc
b4cdacee76
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc
ba0a293f5c
Corrected another case of
...
org.apache.lucene.store.AlreadyClosedException" occuring when
SearchEvent.cleanup() was called while committing local solr index.
9 years ago
reger
4d2b934487
prevent mailto links getting into parser result document's in/outbound link collection
...
by checking mailto scheme early.
- fix upper case mailto protocol assignment
- add test case for getProtocol
9 years ago
luc
8c4ab9c76b
Added an option to eventually limit size of remote solr documents put to
...
local index. See mantis #626 .
9 years ago
luc
a2c08402af
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc
70595d05d0
Modified MemoryControl.main() test to properly end for better results
...
displaying.
9 years ago
sixcooler
1be67d9ab6
CachedSolrConnector was replaced by ConcurrentUpdateSolrConnector years
...
ago - time to let it go
Commented out unused table of cache-objects
9 years ago
reger
28b8bc290a
fix use of NETWORK_SEARCHVERIFY for rwi verification
...
was not used to set the searchevent parameter (done in SearchEventCache.getEvent)
- remove unused corresponding QueryParams.filterfailurls param.
9 years ago
reger
020630efd8
remove unused network scanner parameter from queryparameter
...
Search event is not using networkscanner
(removed filterscannerfail param always init to false)
9 years ago
luc
ad5586f8f6
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc
8ebefa4233
Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was
...
failing. Looks like it was broken since Commit
b43811d38c
9 years ago
luc
7736ee5a42
Updated MediaWimporter main() : display usage in console and stop
...
properly without calling System.exit
9 years ago
reger
cdb8f3b10d
make current ranking score value avail. to search interface / api
...
Update the result score result field with the result queue ranking value to reflect
the actual calculated/used score,
for rwi & solr stack results.
(calc. etc. is unchanged, it's just that result entry carries the latest val
as api retrieves the number from it)
9 years ago
luc
27d11f8671
Fixed isSolrDump function : PushBackInputStream was not unread when
...
returning false (for example with a WikiMedia dump).
9 years ago
Michael Peter Christen
135a123a77
less logging in new language detection
9 years ago
Michael Peter Christen
ef8cd80593
fix for npe
9 years ago
reger
0347bfa71f
Apply collection query constraint/modifiert to rwi result stack.
...
Collection is not available in pure rwi entries (but in local solr metadata)
But if user wishes to filter by query constraint also rwi shall adhere to this
(even if only rwi entries with parsed or solr received metadata may fit)
9 years ago
luc
2a67d2ba6f
Corrected error management for unsupported image formats, parsing
...
errors, and unavailable resources : avoid logging to much Exceptions as
these errors easily occur when searching images.
9 years ago
Michael Peter Christen
d6e9834040
Merge branch 'master' of
...
https://github.com/Scarfmonster/yacy_search_server
# Conflicts:
# .classpath
# build.xml
9 years ago
Michael Peter Christen
d82d311995
Merge branch 'master' of https://github.com/luccioman/yacy_search_server
...
# Conflicts:
# .classpath
9 years ago
reger
b5371ea8c1
read/init crawl queue in a thread
...
to speed-up YaCy start on large existing crawler queues
9 years ago
reger
1160b13172
remove unused md5 from ViewFile servlet params
9 years ago
reger
e163ea88f6
fix vsdParser (Visio) parser return statement
...
(final block un-necessary throw)
9 years ago
reger
b2c8bc0ae6
remove md5_s from default index fields
...
it is not assigned a value / not used
Due to above also excluded from transfer protocol.
9 years ago
luc
e40ae0943b
- No max dimensions specified : render raw image data when source and
...
target image format are the same.
- Corrected scaling condition.
9 years ago
reger
90686a75a2
fix flux factor (additional crawl delay by access count) calculation
9 years ago
luc
4af27289e5
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
297fdb60d3
throw exception if crawler hostqueue can't create hostpath directory.
...
In rare cases hostname may not be a valid filesystem directory name,
which can't be created (e.g. containing '*' char). To prevent crawl queue
looping on this invalid entry by throwing a malformedurlexception.
9 years ago
luc
755efac17d
Use same max file size when loading all resource bytes or opening stream
...
content
9 years ago
luc
bc6c79fc12
Corrected scaling function for non RGB images.
9 years ago
luc
1565559df8
Refactoring : extracted write InputStream method.
9 years ago
luc
f0478bb14d
BMP and ICO image formats support : integrated /haraldk/TwelveMonkeys
...
imageio-bmp-3.2 library.
- better BMP format flavours support
- handle PNG encoded icons
- handle transparency
Added some javadoc url references to .classpath
9 years ago
luc
07437986e7
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
97cc03ef6a
start using a template for urlproxy header
...
It is included as iframe /proxmsg/urlproxyheader.html
to allow full servlet functionallity and flexibility to display some
index/meta data in future.
9 years ago
luc
f01d49c37a
Process large or local file images dealing directly with content
...
InputStream.
9 years ago
luc
3c4c77099d
If available, check content length before downloading. Check also
...
content length is not over Integer.MAX_VALUE.
9 years ago
luc
5bbb2e1730
Ensure resource is closed when reading a full file InputStream
9 years ago
luc
6291a57300
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
0d3c5b223e
have psParser cleanup temp file
9 years ago
reger
7d0d19cb8e
avoid File.deleteOnExit() on temp files
...
JVM registers each file in a list regardless of already deleted and never
cleans up the list during runtime.
This accumulates to a considerable amount of mem during large crawls and/or
long uptime.
To tackle this, all temp files are now created in a subdir of java.io.tmpdir
and the jvm tmpdir property is set to this subdir, which is deleted by
code on shutdown.
Additionally let pdfParser use this tmp subdir too.
9 years ago
luc
bfe51001e3
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
02e4489a23
set tmpfile.deleteOnExit by default,
...
to make sure files are removed on shutdown.
9 years ago
reger
2985baaa01
Exclude repetitive protocol part in tokenized url
...
used as description if none is avail. from parser.
9 years ago
reger
ca3d26a401
harmonize wordsintitle & CollectionSchema.title_words_val calculation,
...
remove obsolete partial init of wordreference from urimetadata
9 years ago
reger
52a9040ae6
Sort out double keywords (dc_subject) early in parsed documents
...
- by direct using Set vs. List
- remove not neede String[] getter
9 years ago
luc
49331dc523
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
47d70732f6
improve locale translator
...
- skip empty line
- robustness file section detection (space independant)
9 years ago
sixcooler
646afe9183
do not store subfield *_coordinate + make all num-fields being docvalues
9 years ago
sixcooler
194df613de
not using 'location' as defaultfacetfield - since we removed it being
...
default.
9 years ago
sixcooler
d3b9349b6f
simplification / speedup of GenerationMemoryStrategy
9 years ago
sixcooler
4a905ec134
fix to not let the AccessTracker-Log grow to much, but have enough data
...
to monitor.
(+gitignore-correction)
9 years ago
reger
20e18d79f8
harmonize document title for archive parsers
9 years ago
luc
f11b5e8309
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
112ae013f4
update bzip and bzip parser process,
...
to return one document for the file with combined parser results of the
containing file and registers it with supplied url and mime of the archive.
9 years ago
reger
e76a90837b
update zip and tar parser process,
...
to return one document for the file with combined parser results of the
containing files.
9 years ago
luc
4e673ffc9a
Ensure closing of InputStream even when an exception occurs.
9 years ago
luc
10696b53f7
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
8532565c7d
optimize order of parsers to try
...
- start with a parser matching the remote supplied mime
9 years ago
reger
681889ae64
use current tar library for untar files
...
- remove old source copy
9 years ago
reger
5d71fc70e3
fix tarParser early exit on looping content
...
- adjust check of data available according to doc
- return null on no recognized content (to not exit TextParser next parser try)
- use commons.compress directly
9 years ago
luc
bcc2e7cb5b
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
2fcf6f104c
fix bzipParser recognition
...
- Bzip2Inputstream checks magic byte itself to identify bz2 (leave it in input)
- try to suppy fitting mime for parsing bz2 content
9 years ago
luc
745e97a575
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
a60b1fb6c2
differentiate api call getLocalPort() from getConfigInt()
9 years ago
reger
11f3666660
increase use of pre.defined CATCHALL_QUERY string
9 years ago